PPPrashant Pandeyinprashantpandey.hashnode.dev·Mar 29 · 1 min readBuilding LLMs From ScratchDeepSeekV3 You can find the full code here: https://github.com/prashantpandeygit/solvingpapers/tree/main/deepseekv3 8x2 MoE DeepSeekV3 model from scratch in pytorch; this is a decoder only transformer00
PPPrashant Pandeyinprashantpandey.hashnode.dev·Feb 28 · 3 min readRay Tracer in 99 Lines of PythonYou can find the full code here: https://github.com/prashantpandeygit/raytracer I started this project as to learn the math behind the working of ray tracers as a weekend challenge. So, what I impleme00
PPPrashant Pandeyinprashantpandey.hashnode.dev·Jan 10 · 2 min readDynamic Resolution Vision Transformer: Faster Inference Without Extra TrainingVision Transformers run at a fixed image resolution. Easy images still go through the full high resolution pipeline, wasting time and compute. In this post, I explain a Dynamic Vision Transformer that00
PPPrashant Pandeyinprashantpandey.hashnode.dev·Nov 20, 2025 · 3 min readImplementing DeepSeek-OCR on Google ColabDeepSeek recently released DeepSeek-OCR, the research paper of it focuses on vision text compression, the model can decode thousands of text tokens from few hundred vision tokens. I wanted to test thi00
PPPrashant Pandeyinprashantpandey.hashnode.dev·Sep 20, 2025 · 3 min readHow Do LLMs Decide the Next Token?Large Language Models (LLMs) like ChatGPT, Gemini, or Claude generate text one piece at a time. They don't write full sentences in one go. Instead, they decide the next token, add it to the text, then00