One thing I learned: weight streaming probably works well on GPUs
Sarah Chieng from Cerebras joined paper club last week to present Training Giant Neural Networks Using Weight Streaming on Cerebras Wafer-Scale Clusters. If you’re not familiar with Cerebras, they make dinner-plates wafer-scale chips designed for AI ...
learning-exhaust.hashnode.dev7 min read