Traveling Words: A Geometric Interpretation of Transformers

09/13/2023
by   Raul Molina, et al.
0

Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere.

READ FULL TEXT
research
05/04/2023

AttentionViz: A Global View of Transformer Attention

Transformer models are revolutionizing machine learning, but their inner...
research
09/15/2021

Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

Transformer architecture has become ubiquitous in the natural language p...
research
04/14/2023

Optimal inference of a generalised Potts model by single-layer transformers with factored attention

Transformers are the type of neural networks that has revolutionised nat...
research
08/09/2022

Attention Hijacking in Trojan Transformers

Trojan attacks pose a severe threat to AI systems. Recent works on Trans...
research
02/12/2020

On Layer Normalization in the Transformer Architecture

The Transformer is widely used in natural language processing tasks. To ...
research
05/19/2021

Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?

The automatic detection of humor poses a grand challenge for natural lan...
research
07/12/2021

The Brownian motion in the transformer model

Transformer is the state of the art model for many language and visual t...

Please sign up or login with your details

Forgot password? Click here to reset