Turning Latency into Throughput: Speculative Decoding for the Decentralized Inference
https://arxiv.org/abs/2511.11733
The Latency Wall
In centralized inference, speed is mostly a function of compute. You optimize by saturating HBM bandwidth, fusing kernels, and keeping GPUs close to their roofline.
In decentralized inference, where...
gradient.network6 min read