@duanerocks

Duane Nielsen

@duanerocks

Obsessed with RL

Joined September 2025

About

Nothing here yet.

Available for

Nothing here yet.

Duane Nielsen's blogs

proximalproximal.hashnode.dev11 posts

Articles Comments

Recently published

DNDuane Nielsenproximal.hashnode.devApr 26 · 14 min read

The Building Blocks of an Agent Memory System

The Building Blocks of an Agent Memory System Most agent "memory" systems retrieve too much. They paste the last N turns into the context window, or they dump the top-K results from a vector search, and they hope the model finds the signal. The model...

DNDuane Nielsenproximal.hashnode.devFeb 5 · 4 min read

Smaller is Better: Replacing GPT-4o-mini with a 7B Local Judge

I expected the 30B model to be the better judge. It wasn't. When I set out to replace OpenAI's GPT-4o-mini as the judge for the Oolong benchmark, my plan was simple: use the biggest local model I had. Qwen3-coder at 30B parameters seemed like the obv...

DNDuane Nielsenproximal.hashnode.devJan 18 · 11 min read

How InfoNCE Creates Exploration: The Hidden Engine of Contrastive RL

A personal exploration of the mechanisms behind emergent exploration in goal-conditioned reinforcement learning Contrastive RL made a huge splash at NIPS 2025, with "1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reach...

DNDuane Nielsenproximal.hashnode.devJan 9 · 8 min read

Contrastive RL: A Step-by-Step Guide to Learning Reachability

The paper "1000 Layer Networks for Self-Supervised RL" won Best Paper at NeurIPS 2025, and for good reason. It demonstrates that goal-conditioned RL can scale to 1000-layer networks—something previously thought impractical. But the real insight isn't...

DNDuane Nielsenproximal.hashnode.devJan 6 · 5 min read

How wp.ScopedTimer Found My 12x Speedup

I was benchmarking a gridworld RL environment built on NVIDIA Warp. The native Warp version hit 8.4 million world-steps per second on small grids - impressive. But when I wrapped it with JAX for compatibility with standard RL training pipelines, perf...

Duane Nielsen

About

Available for

Duane Nielsen's blogs

Recently published

The Building Blocks of an Agent Memory System

Smaller is Better: Replacing GPT-4o-mini with a 7B Local Judge

How InfoNCE Creates Exploration: The Hidden Engine of Contrastive RL

Contrastive RL: A Step-by-Step Guide to Learning Reachability

How wp.ScopedTimer Found My 12x Speedup

Search Hashnode

Duane Nielsen

About

Available for

Duane Nielsen's blogs

Recently published

The Building Blocks of an Agent Memory System

Smaller is Better: Replacing GPT-4o-mini with a 7B Local Judge

How InfoNCE Creates Exploration: The Hidden Engine of Contrastive RL

Contrastive RL: A Step-by-Step Guide to Learning Reachability

How wp.ScopedTimer Found My 12x Speedup