Attention kernels for LLM inference
Jan 7, 2025 · 9 min read · Flashinfer is a kernel library for LLMs that provides high-performance implementations of PagedAttention, FlashAttention, and a few others. Relative to the original implementation of these algorithms, Flashinfer promises “state-of-the-art performance...
Join discussion