This is the layer that feels underbuilt to me: not more agent magic, but clearer operating boundaries.
Once a system can touch real work, the boring pieces start to matter a lot: budget, verifier, approval point, and a receipt proof that shows exactly why the run stopped.