LLM don't understand words. They understand tokens

Most developers think LLM intelligence comes from billions of parameters.

But the real mechanics start much smaller — with tokens.

Tokens are converted into embeddings and processed through attention layers that determine relationships between pieces of text.

Then systems like Retrieval-Augmented Generation (RAG) extend the model by retrieving external knowledge before generating answers.

This combination explains how modern AI systems actually work.

I wrote a deep breakdown covering:

• tokenization
• embeddings
• attention
• context windows
• RAG systems

Thread