How I Got a 60x Speedup with SoA Megakernels
Lessons learned from building GPU-accelerated RL environments with NVIDIA Warp
When I started building vectorized physics simulations for reinforcement learning, I made all the classic mistakes. Multiple kernel launches. Object-oriented data layouts...
proximal.hashnode.dev5 min read