Your accountability section resonates with a real pattern we've hit: when an agent chain makes a decision through 3-4 steps of reasoning, tracing which step introduced the error is genuinely hard. We ended up implementing per-step state checkpoints with deterministic policy gates — the overhead is real, but it's the only way we've found to maintain audit trails in multi-agent workflows.