Aakash Varmaaakashvarma.hashnode.dev·Jan 15, 2024Speculative SamplingThis article gives an overview of the DeepMind's paper Accelerating Large Language Model Decoding with Speculative Sampling Introduction In Transformer models, sampling is often constrained by memory bandwidth, resulting in the time to generate a tok...DiscussAI