Search posts, tags, users, and pages
Considering the scale at which Algolia operates, how do you handle DevOps? How do you make sure you are deploying big changes without breaking anything?
This is a pretty big question and a daily concern of ours. We try to always think about the consequence of a new code and deploy it gradually to avoid unforeseen impact. That said, a mistake can happen. You can count on us to be as transparent, reactive to fix the issue and help you as if we were in the seat next to you.
Testing, testing, testing. We automate a lot of stuff, including unit & integration testing. Furthermore, if we need to do a breaking change, we deploy it incrementally and check our metrics to ensure everything is working well. For example, if we need to change the contract between 2 services. We deploy the new version of the producer, adapt the consumer to this new contract, then remove the old code from the producer.
Most of our software is tested automatically for regressions. It does not means nothing will ever break but we try our best to make our code rock solid.
For example all our code on github is tested automatically using travis. For some projects we also use saucelabs to start real browsers and do unit or functional tests with them
One very important part is the API. We strive to always make it backward compatible. If we introduce a new feature or new syntax, we make sure the previous syntax is still working.
We’d rather update the documentation to stop mentioning an old feature than removing it from the engine. Sometimes, when we really need to change a behavior, we inspect our logs to get the list of potential affected customers and proactively contact them to warn them about the coming change.
Talking about our website and dashboard we deploy from 5 to 20 times per day using capistrano and Travis CI without any particular problem. It’s a very simple setup but it works pretty well in practice: if something goes wrong we just rollback to the previous working version. We rely a lot on automated tests but when things goes wrong we usually fix the issue as soon as possible.
I'm happy that our software engineers responded before operations kicked in ;) I think that illustrates it well :)