Are mobile GUI agents actually the next step after coding agents?

Coding agents are starting to feel real now.

Claude Code, Codex, and similar tools made it normal to let an agent read a repo, edit files, run commands, and fix errors.

I’m curious whether GUI agents are the next step.

Instead of operating code, they would operate apps.

For mobile, this seems especially hard because the agent needs to keep understanding and verifying UI state over time:

This feels very different from browser automation because mobile UI is more visual, less structured, and full of app-specific patterns.

What do you think is the right technical path here?

VLM-first? Accessibility-tree-first? Hybrid?

Thread