Discussion on "I replaced my 500MB vector database Docker stack with a 3MB embedded engine"

Julien L · 2026-03-27T23:17:36.656Z

Most vector database tutorials start the same way: docker pull qdrant/qdrant docker run -p 6333:6333 qdrant/qdrant That's 500MB+ of Docker image, a running server process, a REST API to talk to, and

The infrastructure overhead problem in RAG systems is real — every Docker container is a dependency you have to justify.

Three things make embedded engines attractive for agents:

Cold start economics — agents spinning up on demand can't afford 30-second vector DB initialization. An embedded engine that loads in ~50ms changes what's possible.

Dependency reduction as reliability — every network hop is a failure surface. Embedded means one fewer thing to go wrong.

Context window economics — the math changes when paying per token for retrieval. A 3MB engine returning what you need beats over-retrieval.

Question: what's your strategy when embedded hits its scaling limits?

Discussion

I replaced my 500MB vector database Docker stack with a 3MB embedded engine

Responses(1)

Recent in Forum

Search Hashnode

I replaced my 500MB vector database Docker stack with a 3MB embedded engine

Responses(1)

Recent in Forum