FlashAttention: Making Transformers Faster and More Memory-Efficient
Large Language Models (LLMs) like GPT, BERT, and modern Transformers rely heavily on the self-attention mechanism. While powerful, self-attention is also the biggest performance bottleneck when working with long sequences.
In 2022, Tri Dao and collab...
apurvak3.hashnode.dev5 min read