JAJason Agostoniinjason.agostoni.net10Can the Mid-Tier Models Stack Up Against the Bigger Siblings?2d ago · 13 min read · Can you really justify paying flagship prices when the mid-tier models may already be good enough? The original comparison started with Gemini 3 Flash vs. Claude Sonnet 4.6, then Gemini 3.5 Flash arriJoin discussion
JAJason Agostoniinjason.agostoni.net10Antigravity CLI First Impressions: Fast, Rough, and Not ReadyMay 21 · 5 min read · Google has officially replaced Gemini CLI with the new Antigravity CLI and launched it alongside Gemini 3.5 Flash, which became the default model for the new CLI experience. That made the launch more Join discussion
JAJason Agostoniinjason.agostoni.net00Do Open Frontier Models Have A Chance Against Closed Models ?May 13 · 12 min read · Which of the new open-ish frontier models has the best chance to stand up against closed-source models on both cost and quality? I ran Ship-Bench against Kimi K2.6, Qwen 3.6 Plus, and DeepSeek v4 Pro Join discussion
JAJason Agostoniinjason.agostoni.net00Can Gemma 4 Beat Gemini 3.1 Pro at Coding?Apr 27 · 11 min read · Is a $20/month Google AI Pro account worth it versus running Gemma 4 31B on OpenRouter pay-as-you-go? This Ship-Bench run was designed to answer that question across a realistic coding workflow ratherJoin discussion
JAJason Agostoniinjason.agostoni.net00An AI Benchmark That Tests Real Coding WorkflowsApr 19 · 8 min read · Developers face a real choice: pick a coding model or agent based on synthetic benchmarks that look great but do not predict actual project work. The problem is no longer whether models can score wellJoin discussion