notes.drdroid.ioDr. Patternson: How Meta reduced their MTTR by 50% using AIOpsIntroduction For Meta, reducing downtime has been crucial to ensuring millions (or should I say Billions?) of users have a seamless experience. Recently, Meta shared about one of their internal platforms that helped reduce MTTR by ~50% for critical a...Oct 11, 2024·7 min read
notes.drdroid.ioRCACoPilot: A breakdown of how Microsoft built their Automated RCA BotIntroduction Big Tech companies often have scale enough to justify allocating resources to building internal tools. In this blog, we discuss about RCACoPilot -- an automated incident classification and investigation engine built by Microsoft to impro...Sep 2, 2024·8 min read
notes.drdroid.ioWhat is a PlayBook and what is core components of a playbook?A playbook is a set of instructions that a Doctor Droid bot or an on-call engineer follows during a production incident. https://www.youtube.com/watch?v=T9KfunP9juA A playbook consists of tasks. A task is an instruction that's executed through the ...Aug 28, 2024·2 min read
notes.drdroid.ioHow to do post-deployment monitoring with Doctor Droid?Before starting, ensure that you have setup Doctor Droid Playbooks with at least one playbook and a Slack or MST integration. Check out these tutorials on how to get this done. https://www.youtube.com/watch?v=T9KfunP9juA Consider a scenario where a...Aug 28, 2024·2 min read