@codesweep
Building an autopilot for enterprise software maintenance.
Nothing here yet.
Nothing here yet.
Goal Our goal in this study was to explore whether a mixture of open-weight models, combined through an iterative process, can outperform any single model on the SWE-bench Verified benchmark. Specifically, we wanted to evaluate if patches generated b...

Abstract This study presents a comprehensive analysis of SWE-agent trajectories comparing Kimi K2 Instruct and Claude Sonnet 4 performance on software engineering tasks from the SWE-bench dataset. Through detailed examination of action category distr...
