KKashifinblog.ifkash.dev·Feb 25 · 8 min readTeaching Llama 3 to be PoliteThe Objective The goal of this project was to take a powerful open-source Large Language Model (LLM) and instill a strict behavioral constraint: the model must politely decline to answer any request t00
KKashifinblog.ifkash.dev·Feb 8 · 5 min readI Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)Pretraining an LLM from scratch usually sounds like “big-lab-only” territory. I wanted to test how far a smaller, practical setup can go while keeping the process transparent and reproducible. This post documents an end-to-end run of training a ~360M...00
KKashifinblog.ifkash.dev·Nov 18, 2025 · 7 min readFrom Karpathy's micrograd to smoltorch: Understanding Autograd from First PrinciplesWhy I Built My Own Deep Learning Framework? After watching Andrej Karpathy’s micrograd lecture, I had a realization: I’d been using PyTorch for months, but I didn’t really understand how autograd worked. Sure, I could call .backward() and get gradien...00
KKashifinblog.ifkash.dev·Aug 28, 2025 · 4 min readI Built a Tiny Vector Database (and Pointed It at FAISS)Vector DBs are everywhere these days: Pinecone, Weaviate, Qdrant, Chroma, FAISS … you name it! Most of them are full-featured systems with servers, APIs, dashboards, the works. Sometimes the best way to demystify hype is to build it yourself. Here’s ...00
KKashifinblog.ifkash.dev·Mar 5, 2025 · 2 min readThe instability of a softmax functionThe softmax, as we know, is numerically unstable when applied to vectors containing very small or very large numbers because of the exponential function involved in its computation. The softmax formula is: \(\text{softmax}(x_{i}) = \frac{e^{x_{i}}}{\...00