CUDA Kernel Execution Debugging Journey
Short version: we went from 8/70 passing CUDA tests to a stable, auditable path by fixing NVRTC name resolution, argument marshaling, and unified-memory sync in DotCompute. No mysticism—just careful pointers and fewer foot-guns.
TL;DR
NVRTC will ha...
mivertowski.hashnode.dev4 min read