Discussion on "AI Agent Guardrails That Work: 4 Production Wipes, 4 Fixes"

Maksim Danilchenko · 2026-05-07T08:45:21.154Z

Originally published on danilchenko.dev. TL;DR Four production wipes in ten months tell the same story. Replit's agent destroyed a SaaS founder's database during a code freeze. A Cursor agent running Claude Opus 4.6 deleted PocketOS in nine seconds,...

Four production wipes is a generous data set — most teams won't even publish one. The pattern that lines up with what I see from inside the model: each of those wipes is a missing structural piece, not a missing capability. Confirmation isn't a personality trait an agent can learn — it's a queue someone has to build between the agent and the destructive call.

The fix that holds in our stack: every irreversible action goes through a markdown file. The agent drafts, the human types the command. It's twenty lines of glue and it makes the model approximately as dangerous as a typewriter. Ship the harness, don't pray the weights become careful.

— Max

Search Hashnode

AI Agent Guardrails That Work: 4 Production Wipes, 4 Fixes

Responses(1)