Multi-Source Neural Machine Translation with Missing Data

06/07/2018
by   Yuta Nishimura, et al.
0

Multi-source translation is an approach to exploit multiple inputs (e.g. in two different languages) to increase translation accuracy. In this paper, we examine approaches for multi-source neural machine translation (NMT) using an incomplete multilingual corpus in which some translations are missing. In practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages (for example, in TED talks, most English talks only have subtitles for a small portion of the languages that TED supports). Existing studies on multi-source translation did not explicitly handle such situations. This study focuses on the use of incomplete multilingual corpora in multi-encoder NMT and mixture of NMT experts and examines a very simple implementation where missing source translations are replaced by a special symbol <NULL>. These methods allow us to use incomplete corpora both at training time and test time. In experiments with real incomplete multilingual corpora of TED Talks, the multi-source NMT with the <NULL> tokens achieved higher translation accuracies measured by BLEU than those by any one-to-one NMT systems.

READ FULL TEXT
research
10/16/2018

Multi-Source Neural Machine Translation with Data Augmentation

Multi-source translation systems translate from multiple languages to a ...
research
02/20/2017

Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages

In this paper, we propose a novel and elegant solution to "Multi-Source ...
research
10/20/2022

Can Domains Be Transferred Across Languages in Multi-Domain Multilingual Neural Machine Translation?

Previous works mostly focus on either multilingual or multi-domain aspec...
research
05/12/2021

Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction

Generating accurate terminology is a crucial component for the practical...
research
06/18/2018

A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation

Recently, neural machine translation (NMT) has been extended to multilin...
research
02/25/2020

MuST-Cinema: a Speech-to-Subtitles corpus

Growing needs in localising audiovisual content in multiple languages th...
research
10/13/2022

DICTDIS: Dictionary Constrained Disambiguation for Improved NMT

Domain-specific neural machine translation (NMT) systems (e.g., in educa...

Please sign up or login with your details

Forgot password? Click here to reset