This breakdown is brutally honest and I love it. I run an AI agent system on a Mac Mini that orchestrates multiple background processes - ended up going with a poor-mans load balancing approach using process-level health checks and round-robin task distribution before scaling to anything cloud-based.
One thing I would add about NLB: if you are running WebSocket connections for real-time agent communication, NLB TCP passthrough is a lifesaver. ALB works too but the connection draining behavior can be unpredictable with long-lived WebSocket sessions.
Have you run into any gotchas with cross-zone load balancing costs? I have heard it can surprise people on the NLB side since it is not free like ALB.