Comment by Aamer Mehaisi on "I replaced my 500MB vector database Docker stack with a 3MB embedded engine"

The infrastructure overhead problem in RAG systems is real — every Docker container is a dependency you have to justify.

Three things make embedded engines attractive for agents:

Cold start economics — agents spinning up on demand can't afford 30-second vector DB initialization. An embedded engine that loads in ~50ms changes what's possible.

Dependency reduction as reliability — every network hop is a failure surface. Embedded means one fewer thing to go wrong.

Context window economics — the math changes when paying per token for retrieval. A 3MB engine returning what you need beats over-retrieval.

Question: what's your strategy when embedded hits its scaling limits?

Search Hashnode