What model checking taught me about evaluating AI coding agents
A unit test asks one question: did this run pass? That works when code is deterministic. An LLM coding agent is not. The same prompt produces different code each time, so one passing run proves almost
voloshin.net3 min read