Kashif (@ifkash) | Hashnode

KKashifblog.ifkash.devFeb 25 · 8 min read

Teaching Llama 3 to be Polite

The Objective The goal of this project was to take a powerful open-source Large Language Model (LLM) and instill a strict behavioral constraint: the model must politely decline to answer any request t

0

KKashifblog.ifkash.devFeb 8 · 5 min read

I Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)

Pretraining an LLM from scratch usually sounds like “big-lab-only” territory. I wanted to test how far a smaller, practical setup can go while keeping the process transparent and reproducible. This post documents an end-to-end run of training a ~360M...

0

KKashifblog.ifkash.devNov 18, 2025 · 7 min read

From Karpathy's micrograd to smoltorch: Understanding Autograd from First Principles

Why I Built My Own Deep Learning Framework? After watching Andrej Karpathy’s micrograd lecture, I had a realization: I’d been using PyTorch for months, but I didn’t really understand how autograd worked. Sure, I could call .backward() and get gradien...

0

KKashifblog.ifkash.devAug 28, 2025 · 4 min read

I Built a Tiny Vector Database (and Pointed It at FAISS)

Vector DBs are everywhere these days: Pinecone, Weaviate, Qdrant, Chroma, FAISS … you name it! Most of them are full-featured systems with servers, APIs, dashboards, the works. Sometimes the best way to demystify hype is to build it yourself. Here’s ...

0

KKashifblog.ifkash.devMar 5, 2025 · 2 min read

The instability of a softmax function

The softmax, as we know, is numerically unstable when applied to vectors containing very small or very large numbers because of the exponential function involved in its computation. The softmax formula is: \(\text{softmax}(x_{i}) = \frac{e^{x_{i}}}{\...

0

Kashif

About

Available for

Kashif's blogs

Recently published

Teaching Llama 3 to be Polite

I Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)

From Karpathy's micrograd to smoltorch: Understanding Autograd from First Principles

I Built a Tiny Vector Database (and Pointed It at FAISS)

The instability of a softmax function

Kashif

About

Available for

Kashif's blogs

Recently published

Teaching Llama 3 to be Polite

I Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)

From Karpathy's micrograd to smoltorch: Understanding Autograd from First Principles

I Built a Tiny Vector Database (and Pointed It at FAISS)

The instability of a softmax function

Kashif

About

Available for

Kashif's blogs

Recently published

Teaching Llama 3 to be Polite

I Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)

From Karpathy's micrograd to smoltorch: Understanding Autograd from First Principles

I Built a Tiny Vector Database (and Pointed It at FAISS)

The instability of a softmax function

Search Hashnode

Kashif

About

Available for

Kashif's blogs

Recently published

Teaching Llama 3 to be Polite

I Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)

From Karpathy's micrograd to smoltorch: Understanding Autograd from First Principles

I Built a Tiny Vector Database (and Pointed It at FAISS)

The instability of a softmax function