Golden Sets, LLM-as-Judge, Human Review: Which Grading Style to Reach For
Apr 13 · 18 min read · You have an eval set. You know your rubric. You want a pass rate. Now a new question appears: who decides whether each individual output is a pass? You, reading them one by one? A rule engine that checks for specific strings? Another model that grade...
Join discussion



















