V
How many plain-English assertions did you write total, and what was the pass/fail ratio on first run? For the state- management failures - were these triggered by a specific sequence of user actions, or did they appear randomly? Trying to understand if the AI is doing path-coverage testing or just linear flow execution.