Thank you, really appreciate that.
Our EC2 to ECS Fargate move was one of those migrations that looked simpler from the outside than it actually was. On EC2, we had a very familiar setup — a server, an app running on it, logs going to CloudWatch, and enough manual control that things felt straightforward. But that also meant we were still thinking in terms of servers, even if we weren’t SSH-ing in all the time.
Once we moved toward ECS Fargate, the whole mindset shifted. We had to think more in terms of containers, task definitions, networking, IAM roles, service stability, and how the app should behave in an orchestrated environment. The application code itself didn’t change much, but the surrounding platform absolutely did. That was the biggest lesson for me — the real complexity is usually not the app, it’s the operational model around it.
We also had to rework a few things that were easy on EC2 but needed a cleaner approach in Fargate, especially around environment configuration, deployment flow, and making sure the service stayed healthy without manual intervention. But after the initial learning curve, it started to feel much cleaner and more scalable.
What I liked most was that Fargate removed a lot of the server management overhead. We no longer had to care about instance patching, capacity planning in the same way, or keeping one machine alive just because the app lived there. That shift made the platform feel much more modern and easier to reason about long term.
So overall, I’d say the migration was definitely worth it. It was not just “move the app from one place to another” — it was more like moving from server thinking to platform thinking.
Really liked this breakdown. Most “we cut costs by X%” posts stay very high-level, but you actually walked through the trade-offs of moving from a VM to Azure Container Apps without rewriting the FastAPI backend, which is the part teams worry about the most.
The way you leveraged ACA’s scale-to-zero and per-second billing model makes a lot of sense for APIs that aren’t constantly hammered with traffic. It’s a good reminder that infra cost is not just about instance size, but about how the platform scales your workload over a 24/7 cycle.
I’m curious how the migration felt from an operational point of view:
Did observability/logging get simpler or more fragmented after moving off the VM?
And have you hit any limits or “gotchas” with ACA yet (cold starts, networking, private access, etc.) that teams should be aware of before jumping in?
Overall, this is exactly the kind of honest, numbers-backed story that helps engineers justify a platform change to their teams. Thanks for sharing the journey!