spent the last month rewriting our embedding ingestion pipeline and finally went all-in on tokio instead of rayon thread pools. here's what we're running:
tokio runtime with ~100 concurrent tasks handling vector db writes. we were doing synchronous batch processing before and hitting context switch hell around 8k documents. switched to async and dropped p95 latency from 2.3s to 340ms. no threadpool tuning needed, just spawn tasks.
#[tokio::main]
async fn main() {
let client = PgVector::connect(&db_url).await?;
for doc_batch in docs.chunks(100) {
tokio::spawn(async move {
client.upsert_embeddings(doc_batch).await
});
}
}
biggest pain was dealing with lifetimes in async closures but sqlx's compile-time query checking caught most issues before runtime. switched from diesel to sqlx specifically for this. also using dashmap for shared state instead of mutexes because locking threads is just sad when you have 1000+ concurrent operations.
tokio's a bit heavier than it needs to be (the macro stuff is overkill) but the ecosystem is solid. embeddings api calls don't block worker threads anymore so we can actually run this on smaller hardware.
the learning curve is real though. threads make sense immediately. async requires actually understanding what you're doing.
No responses yet.