Putting the Pieces Together: Building the Transformer Decoder Block in PyTorch
If you’ve been following my "learning in public" series, we’ve spent the last few posts in the trenches. We’ve wrestled with tensor shapes, built mathematical blindfolds (Masked Self-Attention), bridg
shalem-raju.hashnode.dev4 min read