Jun 2 · 8 min read · Have you ever experienced a product that passed functional tests but failed after thermal cycling, or developed intermittent failures after months in the field? Opening the enclosure reveals cracked s
Join discussion
May 31 · 10 min read · Reliability engineering used to be the exclusive domain of Site Reliability Engineers and infrastructure teams. But as backend developers take on more ownership of the services they build, from deploy
Join discussionMay 24 · 10 min read · The first fix lasted 90 seconds. We had corrected the Grafana datasource URL from prometheus:9999 back to prometheus:9090, watched the pod roll, refreshed the dashboard, and seen one panel come alive.
Join discussionMay 18 · 6 min read · We made it easier to use. Then it broke. I got pulled into an incident recently where one of our highest-value enterprise accounts, couldn't export their survey data. Their analytics pipeline had gone
YJacob commentedMay 14 · 5 min read · Broker APIs are powerful. They are also the kind of powerful where one careless script can make your day very interesting. So I built trade-ops-cli, a terminal-based broker operations tool designed ar
Join discussion