Positional Encoding from Sinusoidal to RoPE
Apr 1, 2025 · 16 min read · Transformers process the tokens of a text input in parallel, but unlike sequential models they do not understand position and see the input as a set of tokens. However when we calculate attention for a sentence, words that are the same but in differe...
Join discussion