Compliance Questcompliancequest.hashnode.dev·Apr 30, 2024How to use incident management software effectively in 2024In 2024, the landscape of Incident Management Software is continuously evolving, with expectations of further integration with various IT tools such as monitoring systems, ticketing platforms, and security solutions. The emphasis on user-centric IT d...DiscussIncident management solution
Maxat Akbanovmaxat-akbanov.com·Mar 3, 2024How to safeguard yourself from notorious "rm -rf" command in productionThis article was inspired by the original postmortem analysis made by Gitlab team during the database outage on January 31 2017. In fact, it is great that enterprise companies don't seal the incidents inside but rather tend to share their experience ...Discuss·127 readsbash-and-linuxDevops
DurgaSarandsrnk.hashnode.dev·Jan 25, 2024How ITIL Helps in DevOps ObservabilityITIL (Information Technology Infrastructure Library) and DevOps can complement each other when it comes to observability in IT systems. Observability refers to the ability to understand and monitor the internal state of a system by examining its outp...Discuss·41 readsITIL
Sarat Motamarrisaratdevopsengg.hashnode.dev·Nov 17, 2023Day 22 | Project Management ToolsImportance of Project Management for DevOps Engineers Project management is a crucial aspect for DevOps engineers, serving as the guiding force that ensures seamless collaboration, efficiency, and successful outcomes. In the dynamic world of DevOps...Discussincident management
Connor Averycavery.dev·Jul 27, 2023The Engineers Playbook: Handling IncidentsDepending on your working environment, you may experience incidents differently from someone else. Every organisation has their way of dealing with incidents, how they triage them and what paperwork is required (hopefully not literally). Despite the ...Discuss·29 readsengineering
Nithin Chandran Rcloudfinops.hashnode.dev·Jun 20, 2023How to stay watchful on important events affecting your AWS resources using AWS User Notifications and AWS Health Dashboard?Introduction: As an AWS user, you're likely juggling multiple resources and services at once, each with its own unique performance requirements and potential risks of downtime. That's why it's essential to stay up-to-date on any events that could imp...Discuss·21 likes·49 readsAWS
Boomni Jonathanboomni.hashnode.dev·Jun 11, 2023Software Post-moterm: A Journey through a Web Stack OutageIntroduction Before we embark on our postmortem adventure, let's take a moment to understand what a postmortem is all about. In the realm of software development and operations, a postmortem is a detailed analysis conducted after a system outage or i...Discuss·10 likespostmortem
Ahmed El Taweelahmedeltaweel.hashnode.dev·Feb 5, 2023Incident management, What, Why and How?What Incident management in software refers to the process of identifying, responding to, and resolving unexpected events or failures that occur within a software system. These incidents can range from minor issues, such as a slow page load, to major...Discuss·1 like·221 readsSRE
Jean-Mark Wrightjaywhy13.hashnode.dev·Dec 13, 2022Can a good explanation really prevent a prod incident?A missed opportunity to communicate It was March 21, 2022. I remember the day like yesterday. I just started my On-Call Shift. Our team On-Call rotation is set up so each engineer goes on call for a week, once every 6 or so weeks. The On-Call Enginee...Discuss·1.1K readscommit
Sriram K0xskay.hashnode.dev·Dec 9, 2022"What does it take to keep your customers Happy in Software driven world?" - An SRE engineer PerspectiveSite reliability engineering (SRE) is important for keeping customers happy in a software-driven world because it focuses on the availability, performance, and reliability of production systems. By implementing SRE practices and leveraging the right ...Discuss·47 readsSRE