DEDavid El Malihinsaas-genai-starter.hashnode.dev·4d ago · 6 min readMy eval harness paid for itself on the first run: 0.57 → 0.96, two bugs no unit test could catchI almost shipped a RAG pipeline that, on certain questions, cited exactly the right document — and then told the user the answer wasn't in it. Every unit test was green. The retrieval returned the cor00