Comment by ethanwalker on "Treating LLM prompts like code: a regression catalog for AI failures"

The catalog is a registry of failure modes we've already seen, each one pinned by an anti-pattern rule and a JUnit lock test. That's a regression net, it catches a known failure recurring, not a new one. Tests run fixed inputs, so drift slips past them. This is a must in our workflow as well if our developers are using claude, codex or anything its a signal to the agentic coding platform as well.

We catch drift production-side instead. Every mapping-suggestion call writes an adherence score, and we log the gap between what the AI proposed and what the user actually saved, by domain. When users keep overriding the AI in one domain, that's drift showing up in the data.

The honest gap is the alert (we are making this work in right way in our upcoming sprint). Right now a human has to read the telemetry dashboard to catch it, a threshold that fires on an adherence drop is what's still missing.

Search Hashnode