Gemma 12B Just Destroyed GPT-4o on Web Agents. RL Is the Reason.
A 12-billion parameter open-source model outperforming GPT-4o by 3x on real-world web navigation tasks. That's not a typo. That's what happens when you apply the right reinforcement learning recipe.
MiRA (Milestoning your Reinforcement Learning Enhan...
theagentstack.hashnode.dev8 min read