Kimi K2 Thinking Beat GPT-4 and No One's Talking About It
GPT-4.1 scores 54.6% on SWE-Bench Verified. It's the benchmark where models actually fix real GitHub issues. Not toy problems. Real bugs from real repos.
Kimi K2 scored 65.8%.
An open-source model just beat OpenAI's flagship on the hardest coding t...
quickleap.hashnode.dev6 min read