GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
TLDR - Training Large Language Models (LLMs) presents significant memory challenges because of their large sizes. Approaches like LoRA typically underperform training with full-rank weights in both pre-training and fine-tuning stages since they limit...
blog.akmmusai.pro2 min read