Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

05/23/2019
by   Elena Voita, et al.
0

Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation. In this work we evaluate the contribution made by individual attention heads in the encoder to the overall performance of the model and analyze the roles played by them. We find that the most important and confident heads play consistent and often linguistically-interpretable roles. When pruning heads using a method based on stochastic gates and a differentiable relaxation of the L0 penalty, we observe that specialized heads are last to be pruned. Our novel pruning method removes the vast majority of heads without seriously affecting performance. For example, on the English-Russian WMT dataset, pruning 38 out of 48 encoder heads results in a drop of only 0.15 BLEU.

READ FULL TEXT
research
10/06/2020

Efficient Inference For Neural Machine Translation

Large Transformer models have achieved state-of-the-art results in neura...
research
12/22/2020

Multi-Head Self-Attention with Role-Guided Masks

The state of the art in learning meaningful semantic representations of ...
research
08/10/2021

Differentiable Subset Pruning of Transformer Heads

Multi-head attention, a collection of several attention mechanisms that ...
research
05/02/2020

Hard-Coded Gaussian Attention for Neural Machine Translation

Recent work has questioned the importance of the Transformer's multi-hea...
research
09/21/2019

Self-attention based end-to-end Hindi-English Neural Machine Translation

Machine Translation (MT) is a zone of concentrate in Natural Language pr...
research
11/10/2019

Understanding Multi-Head Attention in Abstractive Summarization

Attention mechanisms in deep learning architectures have often been used...
research
02/14/2021

Query-by-Example Keyword Spotting system using Multi-head Attention and Softtriple Loss

This paper proposes a neural network architecture for tackling the query...

Please sign up or login with your details

Forgot password? Click here to reset