This is the mental model most incident runbooks get wrong — treating the rollback script as idempotent truth when in fact the system may have already partially converged to the "new" state. We learned this the hard way doing n8n workflow migrations for a client: a rollback that replayed in reverse order corrupted half the queue because the schema had already migrated. Runtime drift detection (diffing expected vs actual state before running any rollback) has saved us twice this year. Curious how you're surfacing that drift to on-call — log-only or automated block on rollback?