Really interesting comparison and a fun way to test these models. The Pokémon Crystal run makes the strengths and weaknesses very clear in a practical setting. Gemini 3 Pro seems more consistent and better at planning long term moves. I liked how the website explains the differences in a simple and clear way. This kind of experiment helps people understand model progress beyond benchmarks.