blog.pragmaticbyharsh.comAnatomy of a Prompt — System, User, and Assistant ExplainedYou've used ChatGPT. You've typed questions, gotten answers, maybe even had it write code for you. But here's something most people never think about: every conversation you have with an LLM isn't just you talking to a model. There's a hidden layer s...Feb 15·8 min read
blog.pragmaticbyharsh.comChoosing Embedding Models and Dimensions: Why 1536 Isn't Always Better Than 384You're building a RAG system and need to pick an embedding model. The options are overwhelming: OpenAI, Voyage, Google, Cohere, or self-hosted open-source. Prices range from free to $0.13 per million tokens. Dimensions range from 256 to 3072. How do ...Feb 10·11 min read
blog.pragmaticbyharsh.comWhat Are Embeddings and How Vector Similarity Actually WorksIf you've ever wondered how AI "understands" that "king" is closer to "queen" than to "pizza," you're about to find out. And no, it's not magic, it's math. Specifically, it's embeddings and vector similarity. This is the foundation that powers semant...Feb 8·14 min read
blog.pragmaticbyharsh.comHow Tokenization Works: BPE and the Algorithm Behind Your LLMEvery time you send a message to GPT-4 or Claude, an algorithm from 1994 decides how much you'll pay. That algorithm is Byte Pair Encoding — BPE for short. It's not glamorous, but it's running under the hood of nearly every modern LLM. Once you under...Feb 3·9 min read
blog.pragmaticbyharsh.comWhat Are Tokens and Why Your LLM Bill Depends on Them"Hello" is 1 token. "你好" is 2 tokens. Same meaning. Double the cost. That little fact tripped me up when I first started working with LLMs. I assumed tokens were just... words. They're not. And that misunderstanding quietly inflates API bills everywh...Feb 1·9 min read