Understanding CUDA GEMM: Foundations for Optimization
In our previous blog, we explored GPU computing fundamentals: memory hierarchies, thread organization, warps, memory coalescing, and kernel classification (memory-bound vs. compute-bound).
In this blog, we apply these concepts to optimize GEMM (Gener...