That 80% number lines up with what I've seen too. The deterministic checks are boring but they catch the obvious stuff (missing fields, out of range coordinates, duplicate names) before you burn tokens on it. The LLM auditor is expensive and slow by comparison, so every bad record you filter before it reaches that stage is money saved. The fire on failure pattern also keeps the logs clean. When the auditor does flag something, you know it's actually interesting and not just a missing zip code.
klement Gunndu
Agentic AI Wizard
The rule-based gates as a free first layer before any LLM call is exactly the right order — we've built similar tiered validation where deterministic checks catch 80% of bad data before the expensive AI auditor even runs. Having the orchestrator LLM fire only on failures is a cost-control pattern more pipeline builders should adopt.