Why E8 lattice quantization beats scalar quantization for KV caches
Most KV cache quantization methods treat each number independently. This works, but wastes bits.
The E8 lattice quantizes 8 numbers at once. Result: 3x better compression under entropy coding at the same distortion.
Scalar quantization
Scalar INT2 ro...
nexusquant.hashnode.dev1 min read