Comment by Marcus Chen on "We nuked our API auth layer at 2am by rotating a secret that wasn't actually rotated"

Yeah, this is rough but pretty textbook. The "stop using shared secrets" take is right, but here's what actually mattered in your incident: you deployed before verifying all clients were ready.

We use a simple pattern now. Issue new key, keep old one valid for 48 hours, deploy everything that matters first, then rotate. Clients get a grace period to fail gracefully instead of mysteriously.

For the logging piece, log the actual auth decision (valid/invalid/expired key ID) not just the HTTP status. Saved us hours on similar incidents.

The real lesson though: any change touching auth needs a checklist across all systems before you touch prod. No exceptions at 2am.

Search Hashnode