May 5 · 4 min read · Early in my time on the Kubernetes team, a customer proposed something that was both brilliant and beyond what we were ready for: a global footprint of clusters, one per region, with a synchronized set of jobs. They were running a low-latency applica...
Join discussionMar 9 · 13 min read · TLDR TLDR: Consensus algorithms allow a cluster of computers to agree on a single value (e.g., "Who is the leader?"). Paxos is the academic standard — correct but notoriously hard to understand. Raft is the practical standard — designed for understa...
Join discussionJul 21, 2025 · 14 min read · It was a Tuesday, the kind of unremarkable day that precedes most production fires. The team, sharp and capable, had built a new distributed job scheduling service. To handle failover, they implemented what seemed like a clever, simple leader electio...
Join discussionDec 2, 2022 · 6 min read · Overview Paxos is an algorithm that solves the consensus problem in a network of faulty (unreliable) processors. Leslie Lamport developed the Paxos algorithm. In simple terms, the Paxos algorithm focuses on making all the processors choose the same v...
Join discussion