What it is: agent-harness-kit (ahk) is a thin CLI + library you drop into any repo to give your AI agents structure. It scaffolds a 4-role workflow (Lead → Explorer → Builder → Reviewer), a local SQLite task backlog with atomic claiming, lifecycle health gates, and a full audit trail — all without touching any cloud service or requiring API keys.
Why: Every time I handed a task to Claude Code or OpenCode, the agent would just… roam. It had no memory of what it already explored, no way to coordinate with a second agent, and no gate to check if the codebase was even in a healthy state before it started writing. I wanted something closer to how senior engineers actually work: a clear role per agent, a shared task list, and a mandatory verification step before anything is marked done.
The architecture is deliberately boring: one agent.harness.ts config file, a .harness/ directory with a SQLite DB (using Node ≥ 22's built-in node:sqlite — zero native deps), and markdown agent definitions your LLM reads at session start. It works with Claude Code, OpenCode, Cursor, or any tool that can read files.
npx ahk init # scaffold the harness into your repo
npx ahk health # verify the environment before agents start
npx ahk status # see task backlog and agent activity
Website: https://stack.cardor.dev/ahk
Link: https://github.com/enmanuelmag/agent-harness-kit
Would love to hear your thoughts, especially if you've tried other approaches to multi-agent coordination!
This is the part more teams are going to learn the hard way: the real problem usually is not model quality, it is control quality. Once an agent can roam, call tools, and retry, the expensive failures come from weak stop conditions and weak authority boundaries. The boring controls end up mattering most.
This is the kind of boring architecture that ends up saving teams real pain.\n\nThe extra layer I'd add is retry/failure dedupe. Once the run hits the same blocker shape again, it should have to prove what changed before it gets another chance.\n\nThat tends to matter more than making the agent sound smarter. It's one of the main lessons we've been building MartinLoop around.
This is the kind of boring architecture that ends up saving teams real pain.\n\nThe extra layer I'd add is retry/failure dedupe. Once the run hits the same blocker shape again, it should have to prove what changed before it gets another chance.\n\nThat tends to matter more than making the agent sound smarter. It's one of the main lessons we've been building MartinLoop around.
This is the kind of boring architecture that usually wins.
The extra layer I'd add is retry dedupe. A lot of ugly agent runs are not dramatic, they just keep asking for one more chance without proving anything changed.
If the system can’t show:
then it probably shouldn’t try again.
That one rule has been more useful for us than adding more intelligence to the loop.
This is the kind of boring architecture I trust more.\n\nThe one extra layer I would add is loop dedupe. If the agent hits the same blocker, same file boundary, or same failed verification pattern again, it should not be allowed to politely keep trying forever.\n\nThe runtime should force one of three outcomes at that point: refresh state, ask for review, or stop with a receipt.\n\nThat is the part people miss when they focus only on roles and prompts. A lot of waste comes after the first bad step, not during it. We built MartinLoop around that exact control layer because the expensive failures kept looking more like repeated uncertainty than dramatic breakage.
I like how boring this is. That is usually a good sign. A lot of agent pain is not that the model is weak, it is that the runtime gives it too much room to wander with too little proof before the next step. Role separation plus a real verification gate solves more than people expect because it makes fake progress visible earlier. The one thing I would add is a readable stop receipt after every run so you can tell whether the agent stopped because it finished, got blocked, or just ran out of confidence while looking busy.
This is the right instinct. The missing layer is usually a tighter repo contract: what commands are allowed, what paths are off-limits, what verifier must pass, and what counts as enough change before another retry is admitted. A lot of ugly loops are just the same failure wearing slightly different wording.
This is the right kind of boring.
Most agent failures I see are not model problems first. They're workflow problems: no clear scope, no memory of what already failed, no health gate before writing, and no real stop condition once the run goes sideways.
The one extra layer I'd add is loop control at the runtime level:
That's the part we've been building around with MartinLoop. Not “make the agent smarter,” just make the run accountable.
The Lead → Explorer → Builder → Reviewer split here feels like a good base for that.
Keesan
Sharing big ideas and thoughts from personal experiences as a founder, builder, strategic foresight, future perspective and opinions on tech
This is the kind of boring architecture that ends up saving teams real pain.
The extra layer I'd add is retry/failure dedupe. Once the run hits the same blocker shape again, it should have to prove what changed before it gets another chance.
That tends to matter more than making the agent sound smarter. It's one of the main lessons we've been building MartinLoop around.