AI Software Engineering benchmark just went from 80% to 23%
What is SWE-bench?
SWE-bench is a widely followed benchmark evaluation framework designed to test AI coding assistants on real software engineering tasks.
AI coding assistant benchmarks are supposed to give us clarity. SWE-bench does the opposite.
SW...
nextgenrd.tech1 min read