The dip doesn’t surprise me.
As models get tuned for safety, latency, and cost, you often see tradeoffs in code quality and determinism. It’s not just “better model every time,” it’s shifting priorities.
For dev workflows, consistency matters more than peak output. A slightly weaker but predictable model is often easier to work with than a stronger but inconsistent one.