Zbigniew Matysekfalseinput.com·Feb 18, 2024Fix your StatusCake Uptime Monitoring configuration if you care about regional outagesWebsite uptime monitoring services periodically send requests to your domain, verifying if it responds with the expected status code. These checks aid in detecting and responding to outages and improving infrastructure over time. However, configuring...Discussstatuscake
Gagandeep Singhblog.gagan93.me·Oct 28, 2023Debugging Production DowntimesIntroduction Downtime refers to a period when a system/service is partially or completely unavailable. Based on the criticality of service and the customers you're serving, this can cause a loss of millions of dollars. I've been on the frontline for ...Discuss·112 readsdebugging
metaClusterformetaClustermetacluster.com·Sep 27, 2023The Imperative for Cloud-Agnostic Kubernetes ManagementThe recent AWS outage impacting both us-west-2 and us-east-1 regions across all 3 Availability Zones (AZs) has sparked a renewed dialogue around the reliability of cloud services. This incident underscores the divergence between the theoretical resil...Discuss·38 readsKubernetes
Zahiruddin Tavargerezahere.com·Jul 24, 2023Cache Stampede: A Problem The Industry Fights Every DayIn 2010, Facebook, with 600+ Million users, was already one of the most popular and biggest websites in the world. On September 23, 2010, their scale and limits were put to the test as it faced one of its most severe outages to date. Facebook was dow...Discuss·1 like·119 readsSystem Design
Shobayo Samuelshobayosamuel.hashnode.dev·May 13, 2023Cloud based software system outage PostmortemSummary: From 2: PM - 4:00 PM UTC, requests to our cloud-based software system returned 500 error response messages, resulting in an interruption of services for several hours for 70% of our clients. The outage was caused by a database query error du...DiscussOutage
Khushnood Asifkhushnoodasif.hashnode.dev·Apr 3, 2023From Crisis to Recovery: Reddit's Pi-Day Outage and the Power of Effective Kubernetes ManagementOn Pi Day 2023, Reddit experienced an outage that lasted 314 minutes. The outage was caused by an upgrade from Kubernetes 1.23 to 1.24 on one of the most important clusters in the company. In this blog post, I will discuss the outage and the steps Re...Discuss·145 readsTech TipsKubernetes
Didik Tri Susantoblog.didiktrisusanto.dev·Feb 25, 2023My First Outage: The Bad of Storing Logs in DatabaseThis outage was one of my memorable events as a software engineer because of some reasons: My first job at product based startup I was a new joiner I didn't have production access and had limited knowledge about how the current system was running ...Discuss·2 likes·322 readsDevDebugDebuggingFeb
Abhishek Mishrastalwartcoder.hashnode.dev·Oct 11, 2022How to Avoid Cloud Outages with YugabyteDB for Python AppsCloud environments provide benefits in terms of scalability and ease of use. They are easy to scale because you can add more resources when necessary. They’re easy to use and quick to build thanks to the ecosystem of cloud services and frameworks. Ho...Discuss·1 like·57 readsPython
Asutosh Panda75asu.hashnode.dev·Jun 11, 2022Dependency on Cloud-Hosted Third PartiesBuilding something on the top of 3rd party cloud-hosted services has become a norm, mostly the API services. As the dependency raises the pros and cons that come with them also grows. No doubt the wide adoption of API-based interfaces works best for ...Discuss·46 readsCloud Computing