Alleviating the Inequality of Attention Heads for Neural Machine Translation

09/21/2020
by   Zewei Sun, et al.
0

Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

Transformer-based models have brought a radical change to neural machine...
research
10/13/2021

Semantics-aware Attention Improves Neural Machine Translation

The integration of syntactic structures into Transformer machine transla...
research
09/04/2022

Informative Language Representation Learning for Massively Multilingual Neural Machine Translation

In a multilingual neural machine translation model that fully shares par...
research
09/11/2018

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

This work investigates the alignment problem in state-of-the-art multi-h...
research
08/03/2021

A Dynamic Head Importance Computation Mechanism for Neural Machine Translation

Multiple parallel attention mechanisms that use multiple attention heads...
research
08/16/2023

Fast Training of NMT Model with Data Sorting

The Transformer model has revolutionized Natural Language Processing tas...
research
10/24/2018

Learning to Discriminate Noises for Incorporating External Information in Neural Machine Translation

Previous studies show that incorporating external information could impr...

Please sign up or login with your details

Forgot password? Click here to reset