That's the real lesson nobody emphasizes enough. Cold starts aren't actually about Lambda being slow - they're about how you structure initialization.
Your experience matches mine. I moved a Rust Lambda function from creating connections per-request to using lazy_static for DB pools. Went from 2.1s cold start to 280ms. The runtime itself wasn't the bottleneck, initialization was.
Provisioned concurrency is just band-aid over poor design. You're paying for always-on instances when the real problem is doing expensive work in handler scope.
The tradeoff: module-level initialization means more careful state management and harder testing. But that's a better problem than customers timing out.