We've kept the runnable check as the load-bearing part of our specs too. Without it the agent grades itself generously on what "done" means.
Have you found agents actually read the [A] layer first when prompted, or does it need a hook to enforce the order?