How to Train a 100B+ Parameter Model When You Can't Afford a GPU Cluster
So you've got a model architecture in mind, maybe a fine-tuning job on a massive LLM, and you look at the memory requirements. A 100B parameter model in full FP32 precision needs roughly 400GB just for the parameters. Add optimizer states (Adam store...
alan-west.hashnode.dev6 min read