How to Actually Run an LLM on Almost No RAM
Someone on Reddit recently posted a photo of an LLM running on a 1998 iMac G3 with 32 MB of RAM. My first reaction was "no way." My second reaction was "okay, but how?"
That question sent me down a rabbit hole of model quantization, tiny architecture...
alan-west.hashnode.dev6 min read