Analyzing the Structure of Attention in a Transformer Language Model

06/07/2019
by   Jesse Vig, et al.
0

The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the structure of attention in a Transformer language model, the GPT-2 small pretrained model. We visualize attention for individual instances and analyze the interaction between attention and syntax over a large corpus. We find that attention targets different parts of speech at different layer depths within the model, and that attention aligns with dependency relations most strongly in the middle layers. We also find that the deepest layers of the model capture the most distant relationships. Finally, we extract exemplar sentences that reveal highly specific patterns targeted by particular attention heads.

READ FULL TEXT

page 3

page 5

page 6

page 7

page 8

page 13

research
11/25/2021

Transformer-based Korean Pretrained Language Models: A Survey on Three Years of Progress

With the advent of Transformer, which was used in translation models in ...
research
02/25/2019

Star-Transformer

Although the fully-connected attention-based model Transformer has achie...
research
09/30/2021

SlovakBERT: Slovak Masked Language Model

We introduce a new Slovak masked language model called SlovakBERT in thi...
research
05/21/2023

Multi-Head State Space Model for Speech Recognition

State space models (SSMs) have recently shown promising results on small...
research
07/01/2019

Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?

Learning algorithms become more powerful, often at the cost of increased...
research
09/16/2020

Retrofitting Structure-aware Transformer Language Model for End Tasks

We consider retrofitting structure-aware Transformer-based language mode...
research
10/15/2019

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

We incorporate Tensor-Product Representations within the Transformer in ...

Please sign up or login with your details

Forgot password? Click here to reset