P-Transformer: Towards Better Document-to-Document Neural Machine Translation

12/12/2022
by   Yachao Li, et al.
0

Directly training a document-to-document (Doc2Doc) neural machine translation (NMT) via Transformer from scratch, especially on small datasets usually fails to converge. Our dedicated probing tasks show that 1) both the absolute position and relative position information gets gradually weakened or even vanished once it reaches the upper encoder layers, and 2) the vanishing of absolute position information in encoder output causes the training failure of Doc2Doc NMT. To alleviate this problem, we propose a position-aware Transformer (P-Transformer) to enhance both the absolute and relative position information in both self-attention and cross-attention. Specifically, we integrate absolute positional information, i.e., position embeddings, into the query-key pairs both in self-attention and cross-attention through a simple yet effective addition operation. Moreover, we also integrate relative position encoding in self-attention. The proposed P-Transformer utilizes sinusoidal position encoding and does not require any task-specified position embedding, segment embedding, or attention mechanism. Through the above methods, we build a Doc2Doc NMT model with P-Transformer, which ingests the source document and completely generates the target document in a sequence-to-sequence (seq2seq) way. In addition, P-Transformer can be applied to seq2seq-based document-to-sentence (Doc2Sent) and sentence-to-sentence (Sent2Sent) translation. Extensive experimental results of Doc2Doc NMT show that P-Transformer significantly outperforms strong baselines on widely-used 9 document-level datasets in 7 language pairs, covering small-, middle-, and large-scales, and achieves a new state-of-the-art. Experimentation on discourse phenomena shows that our Doc2Doc NMT models improve the translation quality in both BLEU and discourse coherence. We make our code available on Github.

READ FULL TEXT
research
12/27/2019

Explicit Sentence Compression for Neural Machine Translation

State-of-the-art Transformer-based neural machine translation (NMT) syst...
research
06/04/2019

Lattice-Based Transformer Encoder for Neural Machine Translation

Neural machine translation (NMT) takes deterministic sequences for sourc...
research
05/31/2021

G-Transformer for Document-level Machine Translation

Document-level MT models are still far from satisfactory. Existing work ...
research
08/19/2020

Transformer based Multilingual document Embedding model

One of the current state-of-the-art multilingual document embedding mode...
research
09/19/2020

Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation

Many document-level neural machine translation (NMT) systems have explor...
research
07/27/2021

PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution

Since the superiority of Transformer in learning long-term dependency, t...
research
04/18/2022

Dynamic Position Encoding for Transformers

Recurrent models have been dominating the field of neural machine transl...

Please sign up or login with your details

Forgot password? Click here to reset