@FaTGuY1
Nothing here yet.
Nothing here yet.
No blogs yet.
Yes, we did use load balancing, in our case the biggest problems we faced were the context length of LLM and Rate Limits. We solved the problem by having multiple keys and equally distributing the queries between them using multithreading, as for the context length we just shifted to the K-means summarization technique for long documents.