BPE vs WordPiece vs SentencePiece: A Beginner-Friendly Guide to Subword Tokenization
Introduction
Machines can’t directly understand words. It only know numbers. That’s why we need tokenization, a way to break text into smaller units (tokens) that can be mapped to numbers. There are three common levels of tokenization: word-level, ch...
dhiya-adli.hashnode.dev10 min read