Currently we are holding near about 4k RPS and looks like it will increase by 5x in next quarter. So i started planning to have a parallel backend module (2 or 3 backend parallel pipeline) and we will divert the traffic. Still we are discussing because every time scaling up the infra i don't feel a generic solution and also it will be tough for our DevOps team. If question is not clear please revert back so i can explain in better manner.
Sébastien Portebois
Software architect at Ubisoft
It really depends on the state of your backend, and where your bottleneck is.
By state, I first think about stateful vs stateless: the common best practice as you develop your backend is to build is stateless (as per the famous 12factor.net mantra), then your solution becomes as simple as a load-balancing problem (think NGinx or HAProxy, or look at newer service meshes, and but since you used the #kubernetes tags then it'S even simpler since Kubernetes as load-balancing built-in and it's just of matter of increasing your desired replicas, in fact k8s comes with a pod autoscaler)
But the issue might come from a database capacity, then it'S other problem and other solutions.