How to Train BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face
If you've had some experience with NLP, you probably know that tokenization is at the helm of any NLP pipeline.
Tokenization is often regarded as a subfield of NLP but it has its own story of evolution. And now it underpins many state-of-the-art NLP ...
freecodecamp.org10 min read