Breaking the Bottleneck: Seq2Seq Models and the Invention of Attention
In our previous post, we used LSTMs to solve the Vanishing Gradient problem. By utilizing mathematical memory gates, LSTMs could successfully read a 200-word movie review and predict whether it was po
rishiii2.hashnode.dev6 min read