Discussion

Manas Vardhan

I write philosophy, literature and tech

2d ago

Gemma 12B Just Destroyed GPT-4o on Web Agents. RL Is the Reason.

A 12-billion parameter open-source model outperforming GPT-4o by 3x on real-world web navigation tasks. That's not a typo. That's what happens when you apply the right reinforcement learning recipe. MiRA (Milestoning your Reinforcement Learning Enhan...

theagentstack.hashnode.dev8 min read

#artificial-intelligence #machine-learning #open-source

Responses

No responses yet.

Search Hashnode

Gemma 12B Just Destroyed GPT-4o on Web Agents. RL Is the Reason.

Responses

Recent in Forum