Turning Latency into Throughput: Speculative Decoding for the Decentralized Inference
Nov 24, 2025 · 6 min read · https://arxiv.org/abs/2511.11733 The Latency Wall In centralized inference, speed is mostly a function of compute. You optimize by saturating HBM bandwidth, fusing kernels, and keeping GPUs close to their roofline. In decentralized inference, where...
Join discussion
