I replaced my 500MB vector database Docker stack with a 3MB embedded engine
Most vector database tutorials start the same way:
docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
That's 500MB+ of Docker image, a running server process, a REST API to talk to, and
velesdb.hashnode.dev6 min read
The infrastructure overhead problem in RAG systems is real — every Docker container is a dependency you have to justify.
Three things make embedded engines attractive for agents:
Cold start economics — agents spinning up on demand can't afford 30-second vector DB initialization. An embedded engine that loads in ~50ms changes what's possible.
Dependency reduction as reliability — every network hop is a failure surface. Embedded means one fewer thing to go wrong.
Context window economics — the math changes when paying per token for retrieval. A 3MB engine returning what you need beats over-retrieval.
Question: what's your strategy when embedded hits its scaling limits?