Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation

09/03/2014
by   Jean Pouget-Abadie, et al.
0

The authors of (Cho et al., 2014a) have shown that the recently introduced neural network translation systems suffer from a significant drop in translation quality when translating long sentences, unlike existing phrase-based translation systems. In this paper, we propose a way to address this issue by automatically segmenting an input sentence into phrases that can be easily translated by the neural network translation model. Once each segment has been independently translated by the neural machine translation model, the translated clauses are concatenated to form a final translation. Empirical results show a significant improvement in translation quality for long sentences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2017

Six Challenges for Neural Machine Translation

We explore six challenges for neural machine translation: domain mismatc...
research
08/20/2019

Prosodic Phrase Alignment for Machine Dubbing

Dubbing is a type of audiovisual translation where dialogues are transla...
research
06/06/2017

Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext

We consider the problem of learning general-purpose, paraphrastic senten...
research
04/27/2020

Intelligent Translation Memory Matching and Retrieval with Sentence Encoders

Matching and retrieving previously translated segments from a Translatio...
research
05/03/2020

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

This paper introduces Dynamic Programming Encoding (DPE), a new segmenta...
research
12/03/2020

SemMT: A Semantic-based Testing Approach for Machine Translation Systems

Machine translation has wide applications in daily life. In mission-crit...
research
09/23/2015

Fully automatic multi-language translation with a catalogue of phrases - successful employment for the Swiss avalanche bulletin

The Swiss avalanche bulletin is produced twice a day in four languages. ...

Please sign up or login with your details

Forgot password? Click here to reset