Confirming this from production. We ship a 5-tool MCP server (financial-signals domain) — kept it deliberately tight after watching Claude pick the wrong tool when we tested 8. The real tax wasn't the tokens; it was descriptions getting parsed every turn into context-relevant slots that didn't exist. Curious whether you've measured the curve at 3 vs 5 vs 10 tools, or whether the inflection point is mostly description quality + naming overlap.