I Wrote a GPU Matmul Kernel From Scratch in Triton. Here's Everything I Learned
I recently started learning Triton, OpenAI's Python-based language for writing GPU kernels. My project: build a matrix multiplication kernel from scratch, step by step, until it's competitive with PyT
galacodes.hashnode.dev15 min read