Deeper Transformers Are Forgetting What They Learned. MoDA Is the Fix.
Mar 25 · 18 min read · Papers I'm Reading — Issue #03 Paper: Mixture-of-Depths Attention (MoDA) arXiv: 2603.15619 | cs.LG Authors: Lianghui Zhu, Yuxin Fang, Bencheng Liao et al. — Huazhong University of Science & Technolo
Join discussion



