Visualizing Attention in Transformer-Based Language Representation Models

04/04/2019
by   Jesse Vig, et al.
0

We present an open-source tool for visualizing multi-head self-attention in Transformer-based language representation models. The tool extends earlier work by visualizing attention at three levels of granularity: the attention-head level, the model level, and the neuron level. We describe how each of these views can help to interpret the model, and we demonstrate the tool on the BERT model and the OpenAI GPT-2 model. We also present three use cases for analyzing GPT-2: detecting model bias, identifying recurring patterns, and linking neurons to model behavior.

READ FULL TEXT
research
04/04/2019

Visualizing Attention in Transformer-Based Language models

We present an open-source tool for visualizing multi-head self-attention...
research
06/12/2019

A Multiscale Visualization of Attention in the Transformer Model

The Transformer is a sequence model that forgoes traditional recurrent a...
research
11/02/2020

How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT's Attention

Recent research on the multi-head attention mechanism, especially that i...
research
05/14/2022

Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation

Transformer-based models have been achieving state-of-the-art results in...
research
04/23/2020

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

The great success of Transformer-based models benefits from the powerful...
research
04/26/2022

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

The opaque nature and unexplained behavior of transformer-based language...
research
09/15/2023

Attention-Only Transformers and Implementing MLPs with Attention Heads

The transformer architecture is widely used in machine learning models a...

Please sign up or login with your details

Forgot password? Click here to reset