Evaluating Agents With an LLM-as-Judge Harness (Without Kidding Yourself About It)
Key Takeaways
You can't unit-test a coach agent the way you test a pure function — the output is non-deterministic and "good" is a judgment call, not an assertion.
An LLM-as-judge harness lets you g
virginiamwegahashnodedev.hashnode.dev7 min read