Apr 2 · 7 min read · Key Takeaway: Resolve.ai is a $1B-valued AI SRE platform used by Coinbase, DoorDash, and Salesforce — but pricing requires contacting sales with no public pricing page. Aurora is an open source (Apach
Join discussion
Jan 10 · 5 min read · In my decade of managing infrastructure on AWS—from the early days of "Classic" EC2 to the modern serverless era—I’ve learned one universal truth: Everything fails, all the time. The difference between a junior admin and a senior engineer isn't preve...
Join discussionSep 22, 2025 · 3 min read · Imagine this: it’s barely past midnight, and suddenly production is in flames. Pods are crashing every minute. Terraform insists everything is fine, but the infrastructure looks nothing like what was planned. You’re on the hook for a complete inciden...
Join discussion
Sep 3, 2025 · 7 min read · In modern manufacturing, time isn't just money—it's a competitive advantage. The longer a problem lingers, the more it costs in downtime, defects, customer dissatisfaction, and lost trust. For plant managers, VPs of manufacturing, CI leaders, and ope...
Join discussionMar 17, 2025 · 5 min read · Telecom networks are complex, and when something breaks, finding the cause takes too long. Traditional root cause analysis (RCA) relies on manual log analysis and troubleshooting, which slows down resolution. Engineers spend hours sifting through dat...
Join discussion
Dec 30, 2024 · 4 min read · Root Cause Analysis (RCA) is critical to any effective quality management system. It helps organizations identify the underlying causes of problems, prevent future occurrences, and improve overall performance. While traditional root cause analysis me...
Join discussionJul 7, 2023 · 1 min read · 根本原因分析是用在回溯已經發生的問題,以提出未來的防範或解決方案的常用的工具。可以從最簡單的五個WHY、Pareto Chart柏拉圖、到複雜的事件分解方法或8D (Eight Disciplines Problem Solving)、等品質管理手段。 從專案經驗知道,RCA在顧問甚至一般專案導入的過程中也相當重要而有效。首先我們要強調,檢視問題的根本原因是為了找到治本的解決方案,不是為了將問題發生的責任討論歸咎到某個人身上。所有工作都是人為,就算天災也可能是因為人員防範不足、AI犯錯也是因為訓...
Join discussion