Flashinfer is a kernel library for LLMs that provides high-performance implementations of PagedAttention, FlashAttention, and a few others. Relative to the original implementation of these algorithms, Flashinfer promises “state-of-the-art performance...
fergus.hashnode.dev9 min readNo responses yet.