Yeah, that's a real gotcha. Though I'd push back slightly - for most teams, the first pattern is fine. The overhead of spinning up multiple job contexts often kills your gains unless you're dealing with legitimately long-running tasks (like integration tests that actually take minutes).
Where parallel jobs shine is when you have real dependencies you can express cleanly. But if you're just splitting cargo test and cargo clippy, you're usually burning more in setup time than you save.
The real win is usually splitting test suites that actually block each other or have different hardware needs. That half day debugging was probably worth it though if your builds were hitting timeouts.