Feed
Pro
Search

Sign in
FactoryKit - the AI software factory: tasks in, pull requests out Bug0 - The AI-native e2e QA regression testing The foreword by Hashnode - official blog from the Hashnode team Passmark - The open-source AI framework for regression testing Hashnode gql skill - let your AI agent publish to your Hashnode blog Hackathons Changelog Brand @hashnode on X Hashnode on LinkedIn Support - hello+support@hashnode.com Code of Conduct Terms Privacy Sitemap

Search Hashnode

Search posts, tags, users, and pages

Discussion on "LLMs Use Just 16 of 256 Exponents — So We Compressed the Rest Away" | Hashnode

FeedDiscussion

Alexander Kerchum

Senior Software Engineer at Upstart

May 29

LLMs Use Just 16 of 256 Exponents — So We Compressed the Rest Away

Most people compressing LLM weights are fighting the same war: squeeze 7 billion floats into less memory without wrecking the model. The standard weapons are quantization schemes — map each float to a

blog.kerchum.dev9 min read

#llm #artificial-intelligence #gpu #quantization

Responses

No responses yet.