One exercise that's helped our team is replacing the question "Which model should we use?" with "What happens when the model is wrong?"
That single shift changes the architecture discussion completely. You start thinking about evaluation, fallback paths, confidence thresholds, human review, monitoring, and ownership instead of benchmark scores.
In production, users rarely care which model is behind the product. They care that the system behaves consistently, recovers gracefully, and earns their trust over time. That's what ultimately determines whether an AI product succeeds.