Strong framing - most teams never instrument token usage until the invoice forces them to. One thing I've seen help in client automations: adding a small middleware that strips restated context and trims "As an AI..." preambles before the response goes to the user. Pairs well with prompt caching on Anthropic/OpenAI when you have stable system prompts. Curious if you've measured how much the trick saves when combined with structured output (JSON schema) vs free-form text?