DeepAI
Log In Sign Up

Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation

09/19/2020
by   Pei Zhang, et al.
10

Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.

READ FULL TEXT
09/05/2018

Document-Level Neural Machine Translation with Hierarchical Attention Networks

Neural Machine Translation (NMT) can be improved by including document-l...
09/16/2020

Document-level Neural Machine Translation with Document Embeddings

Standard neural machine translation (NMT) is on the assumption of docume...
10/19/2020

Diving Deep into Context-Aware Neural Machine Translation

Context-aware neural machine translation (NMT) is a promising direction ...
10/01/2019

When and Why is Document-level Context Useful in Neural Machine Translation?

Document-level context has received lots of attention for compensating n...
10/08/2018

Improving the Transformer Translation Model with Document-Level Context

Although the Transformer translation model (Vaswani et al., 2017) has ac...
09/15/2020

Attention-Aware Inference for Neural Abstractive Summarization

Inspired by Google's Neural Machine Translation (NMT) <cit.> that models...