Aakash Varmaaakashvarma.hashnode.devยทJan 15, 2024Speculative SamplingThis article gives an overview of the DeepMind's paper Accelerating Large Language Model Decoding with Speculative Sampling Introduction In Transformer models, sampling is often constrained by memory bandwidth, resulting in the time to generate a tok...AIAdd a thoughtful commentNo comments yetBe the first to start the conversation.