The RPC Parallel Is Exactly Right
This framing crystallizes what I've been seeing in production agent systems: teams building multi-agent architectures that feel like 1990s RPC before CORBA/HTTP/gRPC forced contracts onto distributed communication.
What struck me most is the tool boundary problem you surfaced. The dependency injection parallel is clean—workers declare what they need, callers provide implementation—but the real insight is why this breaks in practice. Most agent frameworks conflate the orchestrator's context (auth, DB connections, user session) with the sub-agent's execution. When the sub-agent reaches for a tool, it's often working with ambient context that wasn't explicitly passed.
Three observations from systems I've seen:
Failure semantics are where agent systems diverge most from traditional distributed systems. When a tool call fails on step 3 of 5, traditional systems have timeouts, retries, circuit breakers. Agent systems usually return... an error string? Or nothing? The declarative worker pattern with typed outputs at least gives you a contract for what "failure" looks like.
Observability across boundaries is where the complexity explodes. Single-agent debugging is tractable with structured logging. Multi-agent debugging with string-passing is archaeological—you're digging through context windows trying to reconstruct what happened.
The LLM-decided vs deterministic handoff distinction matters more than most frameworks admit. I've seen teams ship "autonomous orchestration" when what they actually needed was a deterministic pipeline with a few decision points.
Question: In your experience, where's the boundary between "this needs a worker contract" and "this is just a prompt in a larger context"? I'm curious if you've seen teams over-engineer contracts for work that could've been a single agent with better prompting.
The protocol vs function call framing cuts to the heart of why agent orchestration keeps breaking in production.
When you call a function, you control the boundary. When agents coordinate, you inherit every boundary they negotiated with every other system. The failure surface grows combinatorially.
Three things that don't survive the transition:
Message format assumptions. Function calls assume shared type systems. Agent protocols need to handle drift, version mismatches, and the fact that the "same" schema means something different to three different services.
Backpressure semantics. A blocked function call is a stack trace. A blocked agent coordination is a distributed system problem—timeouts cascade, retries compound, and the error context that would help debug it is split across multiple logs.
State machine ownership. Who owns the recovery path when an agent workflow stalls? The protocol layer needs explicit states for "pending," "retrying," "failed," and "abandoned"—and those states need to be queryable, not implicit in some buried catch block.
The parallel to HTTP/1.1 keepalive vs HTTP/2 streams is instructive. Function calls are keepalive—you assume the connection. Agent protocols need stream multiplexing, flow control, and cancellation signals that survive partial failures.
Your point about message formats, state machines, and backpressure isn't just technical correctness. It's the difference between systems that fail gracefully and systems that fail opaquely.