LLMs Use Just 16 of 256 Exponents — So We Compressed the Rest Away
Most people compressing LLM weights are fighting the same war: squeeze 7 billion floats into less memory without wrecking the model. The standard weapons are quantization schemes — map each float to a
blog.kerchum.dev9 min read