proximal.hashnode.devSmaller is Better: Replacing GPT-4o-mini with a 7B Local JudgeI expected the 30B model to be the better judge. It wasn't. When I set out to replace OpenAI's GPT-4o-mini as the judge for the Oolong benchmark, my plan was simple: use the biggest local model I had. Qwen3-coder at 30B parameters seemed like the obv...Feb 5·4 min read
proximal.hashnode.devHow InfoNCE Creates Exploration: The Hidden Engine of Contrastive RLA personal exploration of the mechanisms behind emergent exploration in goal-conditioned reinforcement learning Contrastive RL made a huge splash at NIPS 2025, with "1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reach...Jan 18·11 min read
proximal.hashnode.devContrastive RL: A Step-by-Step Guide to Learning ReachabilityThe paper "1000 Layer Networks for Self-Supervised RL" won Best Paper at NeurIPS 2025, and for good reason. It demonstrates that goal-conditioned RL can scale to 1000-layer networks—something previously thought impractical. But the real insight isn't...Jan 9·8 min read
proximal.hashnode.devHow wp.ScopedTimer Found My 12x SpeedupI was benchmarking a gridworld RL environment built on NVIDIA Warp. The native Warp version hit 8.4 million world-steps per second on small grids - impressive. But when I wrapped it with JAX for compatibility with standard RL training pipelines, perf...Jan 6·5 min read
proximal.hashnode.devHow I Got a 60x Speedup with SoA MegakernelsLessons learned from building GPU-accelerated RL environments with NVIDIA Warp When I started building vectorized physics simulations for reinforcement learning, I made all the classic mistakes. Multiple kernel launches. Object-oriented data layouts...Jan 3·5 min read