Right, and the bit that's easy to miss is the check actually has to be able to fail. I've had specs where the "check" was something like compiles-and-returns-200, which the agent passes whether the behaviour underneath is right or wrong, so it tells you nothing. The ones that pull their weight are the ones that would go red if the thing they describe stopped working. Otherwise you've just written more prose for the agent to interpret, which is the problem you were trying to get away from.