@codesweep
Building an autopilot for enterprise software maintenance.
Nothing here yet.
Nothing here yet.
Dec 9, 2025 · 5 min read · Goal Our goal in this study was to explore whether a mixture of open-weight models, combined through an iterative process, can outperform any single model on the SWE-bench Verified benchmark. Specifically, we wanted to evaluate if patches generated b...
Join discussion
Aug 5, 2025 · 6 min read · Abstract This study presents a comprehensive analysis of SWE-agent trajectories comparing Kimi K2 Instruct and Claude Sonnet 4 performance on software engineering tasks from the SWE-bench dataset. Through detailed examination of action category distr...
Join discussion