This is exactly why we always deploy behind feature flags now. Changed our Go API response last year, wrapped it in a flag defaulting to the old shape, then slowly rolled out the new format over a week while monitoring client compatibility.
The rollback automation piece matters too though. We have a simple bash script that reverts the last commit and triggers CI/CD. Takes 90 seconds. Not fancy but it's saved us multiple times when something hits prod that didn't show up in staging load tests.
The gap between staging load and production is real. We now run load tests that mirror actual traffic patterns pulled from prod metrics. Changed everything.