JGJay Galaingalacodes.hashnode.dev·6d ago · 15 min readI Wrote a GPU Matmul Kernel From Scratch in Triton. Here's Everything I LearnedI recently started learning Triton, OpenAI's Python-based language for writing GPU kernels. My project: build a matrix multiplication kernel from scratch, step by step, until it's competitive with PyT00
JGJay Galaingalacodes.hashnode.dev·Nov 30, 2025 · 18 min readGPUs: The Hardware That Power AIYou've probably used ChatGPT or Claude. Maybe you've even fine-tuned a small language model on your laptop. But have you ever wondered why training or even inferencing GPT or Claude requires tens of thousands of specialized chips aka GPUs instead of ...00
JGJay Galaingalacodes.hashnode.dev·Nov 5, 2025 · 15 min readSpeculative Decoding: From Theory to ImplementationLet's talk about speculative decoding. One of the most elegant optimization techniques in modern LLM inference. If you've ever wondered how to squeeze 2-3x more throughput from your language models without sacrificing output quality, you're in the ri...01B
JGJay Galaingalacodes.hashnode.dev·May 30, 2025 · 6 min readBits Don't Lie: Datatypes in Modern LLMsLet’s talk about the different datatypes that are being used in modern LLMs like GPT, LLaMa and the like. The most common ones that you might have heard: FP32, BFloat16, Float16, INT8, etc. These are all standard data types and available in PyTorch a...00
JGJay Galaingalacodes.hashnode.dev·Oct 26, 2024 · 7 min readIntroduction to LLM inferencingUnless you’re living under a rock, you’ve probably heard of Large Language Models (LLMs) and even used a few of the popular applications like ChatGPT, Claude, Perplexity, etc. powered by these LLMs. So without going too deep into what LLMs are, let’s...00