blog.ifkash.devTeaching Llama 3 to be PoliteThe Objective The goal of this project was to take a powerful open-source Large Language Model (LLM) and instill a strict behavioral constraint: the model must politely decline to answer any request t2d ago·8 min read
blog.ifkash.devI Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)Pretraining an LLM from scratch usually sounds like “big-lab-only” territory. I wanted to test how far a smaller, practical setup can go while keeping the process transparent and reproducible. This post documents an end-to-end run of training a ~360M...Feb 8·5 min read
blog.ifkash.devFrom Karpathy's micrograd to smoltorch: Understanding Autograd from First PrinciplesWhy I Built My Own Deep Learning Framework? After watching Andrej Karpathy’s micrograd lecture, I had a realization: I’d been using PyTorch for months, but I didn’t really understand how autograd worked. Sure, I could call .backward() and get gradien...Nov 18, 2025·7 min read
blog.ifkash.devI Built a Tiny Vector Database (and Pointed It at FAISS)Vector DBs are everywhere these days: Pinecone, Weaviate, Qdrant, Chroma, FAISS … you name it! Most of them are full-featured systems with servers, APIs, dashboards, the works. Sometimes the best way to demystify hype is to build it yourself. Here’s ...Aug 28, 2025·4 min read
blog.ifkash.devThe instability of a softmax functionThe softmax, as we know, is numerically unstable when applied to vectors containing very small or very large numbers because of the exponential function involved in its computation. The softmax formula is: \(\text{softmax}(x_{i}) = \frac{e^{x_{i}}}{\...Mar 5, 2025·2 min read