27m ago · 2 min read · AI agents are fundamentally different from traditional software. In a standard application, code executes deterministically. If there is an error, it throws an exception and halts. Autonomous agent fr
Join discussion
3h ago · 14 min read · I test software for a living. Nine-plus years of finding the cracks other people put in code — but rarely the one writing it. So when I found a June game jam asking for something inspired by the solst
Join discussion
4h ago · 3 min read · TL;DR: I compared the main LLM-as-judge tools (DeepEval's G-Eval, Confident AI, Evidently, Braintrust, Promptfoo, and MLflow) on the axis that actually decides whether the scores mean anything: how we
Join discussion