Understanding Reduce-Scatter, All-Gather, and All-Reduce in Distributed Computing for LLM Training
In the world of parallel computing, particularly in distributed machine learning and high-performance computing, collective communication operations play a crucial role. Among these operations, reduce-scatter, all-gather, and all-reduce are commonly ...
denny.hashnode.dev5 min read