Understanding Token and Positional Embeddings in Transformers
Transformers, the backbone of many state-of-the-art NLP models such as BERT, GPT has revolutionized the way we approach natural language understanding tasks. One key innovation in transformers is their ability to handle entire sequences of tokens sim...
rahullokurte.com4 min read
The section that represents the tokens for King and Queen is a bit unclear:
The word "king" might be represented as [0.5,0.2,0.8,...][0.5, 0.2, 0.8, ...][0.5,0.2,0.8,...]. The word "queen" might be represented as [0.6,0.3,0.7,...][0.6, 0.3, 0.7, ...][0.6,0.3,0.7,...].
Their closeness in the vector space reflects their semantic similarity.
All of the triples in King are the same (for some reason not clear). Similarly, the triples in Queen are the same.
However, the triples in King could be considered nothing like the triples in Queen.
Could the illustrative numbers in this section be updated please or just slightly differently explained ?