Building an LLM Judge That Doesn't Lie to You
Our first LLM judge gave a 9/10 to a page where the hero text was completely invisible.
Dark grey text on a dark background image. The CSS was syntactically valid. The HTML was well-structured. Every tag was correct. The page was unusable. And our ju...
tebza.hashnode.dev9 min read