BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention

11/09/2020
by   Zhebin Zhang, et al.
0

BERT-enhanced neural machine translation (NMT) aims at leveraging BERT-encoded representations for translation tasks. A recently proposed approach uses attention mechanisms to fuse Transformer's encoder and decoder layers with BERT's last-layer representation and shows enhanced performance. However, their method doesn't allow for the flexible distribution of attention between the BERT representation and the encoder/decoder representation. In this work, we propose a novel BERT-enhanced NMT model called BERT-JAM which improves upon existing models from two aspects: 1) BERT-JAM uses joint-attention modules to allow the encoder/decoder layers to dynamically allocate attention between different representations, and 2) BERT-JAM allows the encoder/decoder layers to make use of BERT's intermediate representations by composing them using a gated linear unit (GLU). We train BERT-JAM with a novel three-phase optimization strategy that progressively unfreezes different components of BERT-JAM. Our experiments show that BERT-JAM achieves SOTA BLEU scores on multiple translation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2016

Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model

Neural machine translation has shown very promising results lately. Most...
research
02/17/2020

Incorporating BERT into Neural Machine Translation

The recently proposed BERT has shown great power on a variety of natural...
research
05/30/2019

Unbabel's Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing

This paper describes Unbabel's submission to the WMT2019 APE Shared Task...
research
06/28/2019

Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts

The transformer is a state-of-the-art neural translation model that uses...
research
08/27/2019

Multi-Layer Softmaxing during Training Neural Machine Translation for Flexible Decoding with Fewer Layers

This paper proposes a novel procedure for training an encoder-decoder ba...
research
09/13/2022

CNN-Trans-Enc: A CNN-Enhanced Transformer-Encoder On Top Of Static BERT representations for Document Classification

BERT achieves remarkable results in text classification tasks, it is yet...
research
10/12/2021

LightSeq2: Accelerated Training for Transformer-based Models on GPUs

Transformer-based neural models are used in many AI applications. Traini...

Please sign up or login with your details

Forgot password? Click here to reset