LLMs Use Just 16 of 256 Exponents — So We Compressed the Rest Away
20m ago · 9 min read · Most people compressing LLM weights are fighting the same war: squeeze 7 billion floats into less memory without wrecking the model. The standard weapons are quantization schemes — map each float to a
Join discussion
































