Grokking of MLP on Modular Addition
Jan 24 · 4 min read · Background Grokking is a phenomenon where a model quickly achieves near-perfect training accuracy (memorization), while validation accuracy remains near chance (or even worse-than-chance) for a long time, and then later transitions sharply to strong ...
Join discussion