Exactly this. The order of operations matters: deduplication and caching first, then routing to cheaper models, then prompt compression. Most teams do it backwards and wonder why switching from GPT-4o to a smaller model only saved 20% when they expected 70%.