G-Transformer for Document-level Machine Translation

05/31/2021
by   Guangsheng Bao, et al.
0

Document-level MT models are still far from satisfactory. Existing work extend translation unit from single sentence to multiple sentences. However, study shows that when we further enlarge the translation unit to a whole document, supervised training of Transformer can fail. In this paper, we find such failure is not caused by overfitting, but by sticking around local minima during training. Our analysis shows that the increased complexity of target-to-source attention is a reason for the failure. As a solution, we propose G-Transformer, introducing locality assumption as an inductive bias into Transformer, reducing the hypothesis space of the attention from target to source. Experiments show that G-Transformer converges faster and more stably than Transformer, achieving new state-of-the-art BLEU scores for both non-pretraining and pre-training settings on three benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2022

Modeling Context With Linear Attention for Scalable Document-Level Translation

Document-level machine translation leverages inter-sentence dependencies...
research
12/12/2022

P-Transformer: Towards Better Document-to-Document Neural Machine Translation

Directly training a document-to-document (Doc2Doc) neural machine transl...
research
10/08/2018

Improving the Transformer Translation Model with Document-Level Context

Although the Transformer translation model (Vaswani et al., 2017) has ac...
research
02/19/2020

Toward Making the Most of Context in Neural Machine Translation

Document-level machine translation manages to outperform sentence level ...
research
12/15/2022

TRIP: Triangular Document-level Pre-training for Multilingual Language Models

Despite the current success of multilingual pre-training, most prior wor...
research
05/08/2023

Target-Side Augmentation for Document-Level Machine Translation

Document-level machine translation faces the challenge of data sparsity ...
research
06/26/2019

Sharing Attention Weights for Fast Transformer

Recently, the Transformer machine translation system has shown strong re...

Please sign up or login with your details

Forgot password? Click here to reset