Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation

by   Jean Pouget-Abadie, et al.

The authors of (Cho et al., 2014a) have shown that the recently introduced neural network translation systems suffer from a significant drop in translation quality when translating long sentences, unlike existing phrase-based translation systems. In this paper, we propose a way to address this issue by automatically segmenting an input sentence into phrases that can be easily translated by the neural network translation model. Once each segment has been independently translated by the neural machine translation model, the translated clauses are concatenated to form a final translation. Empirical results show a significant improvement in translation quality for long sentences.


page 1

page 2

page 3

page 4


Six Challenges for Neural Machine Translation

We explore six challenges for neural machine translation: domain mismatc...

Prosodic Phrase Alignment for Machine Dubbing

Dubbing is a type of audiovisual translation where dialogues are transla...

Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext

We consider the problem of learning general-purpose, paraphrastic senten...

Intelligent Translation Memory Matching and Retrieval with Sentence Encoders

Matching and retrieving previously translated segments from a Translatio...

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

This paper introduces Dynamic Programming Encoding (DPE), a new segmenta...

SemMT: A Semantic-based Testing Approach for Machine Translation Systems

Machine translation has wide applications in daily life. In mission-crit...

Fully automatic multi-language translation with a catalogue of phrases - successful employment for the Swiss avalanche bulletin

The Swiss avalanche bulletin is produced twice a day in four languages. ...