Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT

05/22/2023
by   Shahar Katz, et al.
0

Recent advances in interpretability suggest we can project weights and hidden states of transformer-based language models (LMs) to their vocabulary, a transformation that makes them human interpretable and enables us to assign semantics to what was seen only as numerical vectors. In this paper, we interpret LM attention heads and memory values, the vectors the models dynamically create and recall while processing a given input. By analyzing the tokens they represent through this projection, we identify patterns in the information flow inside the attention mechanism. Based on these discoveries, we create a tool to visualize a forward pass of Generative Pre-trained Transformers (GPTs) as an interactive flow graph, with nodes representing neurons or hidden states and edges representing the interactions between them. Our visualization simplifies huge amounts of data into easy-to-read plots that reflect why models output their results. We demonstrate the utility of our modeling by identifying the effect LM components have on the intermediate processing in the model before outputting a prediction. For instance, we discover that layer norms are used as semantic filters and find neurons that act as regularization vectors.

READ FULL TEXT

page 2

page 5

page 6

research
05/04/2023

AttentionViz: A Global View of Transformer Attention

Transformer models are revolutionizing machine learning, but their inner...
research
03/30/2022

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

Breakthroughs in transformer-based models have revolutionized not only t...
research
03/26/2021

Dodrio: Exploring Transformer Models with Interactive Visualization

Why do large pre-trained transformer-based models perform so well across...
research
05/30/2022

Attention Flows for General Transformers

In this paper, we study the computation of how much an input token in a ...
research
05/04/2023

On the Expressivity Role of LayerNorm in Transformers' Attention

Layer Normalization (LayerNorm) is an inherent component in all Transfor...
research
04/26/2022

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

The opaque nature and unexplained behavior of transformer-based language...

Please sign up or login with your details

Forgot password? Click here to reset