Nothing here yet.
Nothing here yet.
3d ago · 4 min read · In the last article I showed the SWE-Bench numbers. Open-weight models are basically tied with the proprietary ones now. Two models stood out to me: MiniMax M2.5 and Qwen3.5. Here's what I found out a
Join discussion5d ago · 3 min read · I was looking at the SWE-Bench Verified leaderboard last week and the numbers surprised me. The gap between proprietary and open-weight models is almost gone. Not in some academic test. In actual bug
Join discussion