AKAlexander Kerchuminblog.kerchum.dev·Jun 5 · 12 min readLessons From the Bottom of the Stack: Shipping a QuantThe SCLP compression algorithm — palette the exponents, sidecar the outliers, pack the rest — was a week or so of prototyping. The two posts before this one covered it end to end. This post is about t00
AKAlexander Kerchuminblog.kerchum.dev·Jun 3 · 9 min readFrom 8 Bits to 4: Sidecar, MoE, and the imatrix Trick That WorkedLast time we cut BF16 weights in half by treating the exponent as a 16-entry palette instead of an 8-bit field. SCLP8: 7.9 GB instead of 15.0, perplexity slightly better than the original, token gener00
AKAlexander Kerchuminblog.kerchum.dev·May 29 · 9 min readLLMs Use Just 16 of 256 Exponents — So We Compressed the Rest AwayMost people compressing LLM weights are fighting the same war: squeeze 7 billion floats into less memory without wrecking the model. The standard weapons are quantization schemes — map each float to a20
AKAlexander Kerchuminblog.kerchum.dev·Nov 24, 2021 · 1 min readHow to move a directory from one git repo to another (or new) without losing historyMake copy of repo git clone dirtySourceRepo newSourceRepo OR clone from actual git repo and prevent push git remote set-url --push origin no_push Make sure to checkout the correct branch before the next step. Cloning from another local directory al...00