DFlash: How Block Diffusion Is Making LLM Inference 3–6x Faster on TPUs
Generating tokens from a large language model is fundamentally sequential: each token depends on all prior tokens, so the model must complete one full forward pass before starting the next. On modern AI accelerators designed for massive parallelism, ...
prabhakar-ai.hashnode.dev6 min read