Feb 8 · 5 min read · Pretraining an LLM from scratch usually sounds like “big-lab-only” territory. I wanted to test how far a smaller, practical setup can go while keeping the process transparent and reproducible. This post documents an end-to-end run of training a ~360M...
Join discussionNov 26, 2025 · 5 min read · I've been working as a software engineer at a startup for quite some time, and now I'm excited to move into the AI field. There are so many topics to explore, and it can feel overwhelming. To make it easier, I started by learning the basics of Large ...
Join discussion
Oct 6, 2025 · 17 min read · 1. Introduction In 2017, Vaswani et al. dropped a paper titled “Attention Is All You Need,” and it quietly rewired the entire field of deep learning. Within a few years, its architecture — the Transformer — became the foundation for nearly every mode...
Join discussion
Sep 17, 2025 · 23 min read · You might have read my earlier blog, Explain GPT to a 5-Year-Old 👧🧒, where we kept things simple and magical ✨. But let's be honest—while a 5-year-old is happy to know that, a friendly Chef Cupcake 🤖👨🍳 is cooking up sentences, a curious adult 🧠...
Join discussion
Aug 20, 2025 · 4 min read · Why These Concepts Matter ? When you send a message to a chatbot, ask an AI to translate a sentence, or search the web, something fascinating happens behind the scenes. The AI isn’t “reading” words the way you do — it’s converting them into numbers a...
Join discussion
Aug 20, 2025 · 5 min read · Why Everyone’s Talking About GPT A few years ago, it sounded like science fiction: a computer you could talk to, ask questions, and get back answers that felt like they were written by a person. Now, tools like ChatGPT are being used to write emails,...
Join discussion
Jul 18, 2025 · 15 min read · In this article, I present an end-to-end implementation of the paper “Attention is All You Need”, along with selected quotes from the paper. This article focuses only on implementation. For a more explanatory and conceptual guide, I recommend the fol...
Join discussionJun 11, 2025 · 9 min read · This is the starting point of a new series where I introduce you to AI and show you how to apply it in your workflow so we can build great products together. So let's start with some terms that aren't necessary but help us understand what happens beh...
Join discussion
Apr 8, 2025 · 7 min read · Artificial Intelligence (AI) is often felt like a complex web and full of technical jargons like transformers, self-attention, or tokenization etc. You would have wondered what the fuss is about and its so complicated. But what if we could simplify t...
Join discussion