Denny Wangdenny.hashnode.dev·17 hours agoUnderstanding Memory and Throughput in LLMs Training: A Practical ExampleIntroduction Large Language Models (LLMs) like GPT-3 and BERT are at the forefront of AI advancements, powering applications from natural language understanding to generative text. These models, however, bring significant challenges in terms of memor...Discussllmtraining
Denny Wangdenny.hashnode.dev·Jun 29, 2024Understanding Fully Sharded Data Parallel (FSDP) in Distributed TrainingFully Sharded Data Parallel (FSDP) is a technique used in distributed training to improve the efficiency and scalability of training large models across multiple GPUs. Here's a detailed look at what FSDP is, its role in distributed training, and how ...Discussllmtraining
Denny Wangdenny.hashnode.dev·Jun 29, 2024Understanding the Components of Distributed TrainingIn distributed training, several key components work together to enable efficient and scalable machine learning. These components include communication libraries, training frameworks, and hardware (GPUs). This blog post introduces these components, t...Discussdistributed training
Denny Wangdenny.hashnode.dev·Jun 29, 2024Understanding Reduce-Scatter, All-Gather, and All-Reduce in Distributed Computing for LLM TrainingIn the world of parallel computing, particularly in distributed machine learning and high-performance computing, collective communication operations play a crucial role. Among these operations, reduce-scatter, all-gather, and all-reduce are commonly ...Discussllmtraining