vvnasantosh.hashnode.devUnderstanding CUDA GEMM: Foundations for OptimizationIn our previous blog, we explored GPU computing fundamentals: memory hierarchies, thread organization, warps, memory coalescing, and kernel classification (memory-bound vs. compute-bound). In this blog, we apply these concepts to optimize GEMM (Gener...Nov 27, 2025·43 min read
vvnasantosh.hashnode.devOptimizing GEMM: GPU Architecture EssentialsEvery time you ask ChatGPT a question, get a movie recommendation on Netflix, or watch your phone recognize faces in photos, billions of matrix multiplications are happening behind the scenes. This fundamental mathematical operation has become the co...Nov 27, 2025·22 min read