Discussion

Sumanth Mandavalli

Developer

Sep 18, 2025

Tokenizers in NLP: Word, Character, and Sub-Word Models

In the world of Natural Language Processing (NLP), the first step in almost every pipeline is tokenization — breaking raw text into smaller units, known as tokens. These tokens serve as the building blocks that machine learning models, especially Lar...

tokenizermodels.hashnode.dev3 min read

#nlp-transformers #tokenization #machine-learning #deep-learning #techblog #ai #bert #llm #transformers #gpt

Responses

No responses yet.

Search Hashnode

Tokenizers in NLP: Word, Character, and Sub-Word Models

Responses

Recent in Forum