Back to Paper Network

Multi-Head Attention · Contextual Embedding

About This Visualization

This interactive diagram illustrates how multi-head attention transforms a polysemous word ("Amazon") based on surrounding context. The visualization shows:

Caveat: All positions, weights, and probabilities are illustrative — not computed from any model. The mechanism (Q·K^T, softmax, weighted sum, multi-head concat, W_O projection, linear task heads) reflects how transformers actually work.

Related