Vision Yatra: Step 14 — Masked Multi-Head Self-Attention: How Transformers Generate Text Sequentially

Hey everyone! 👋I’m Pankaj, and welcome to Vision Yatra: Step 14, where we unlock the most critical innovation in Transformer decoders: 🔒 Masked Multi-Head Self-Attention In the last post, we saw how the Decoder generates text one token at a time ...