Robust Machine Translation with Domain Sensitive Pseudo-Sources: Baidu-OSU WMT19 MT Robustness Shared Task System Report

06/19/2019
by   Renjie Zheng, et al.
0

This paper describes the machine translation system developed jointly by Baidu Research and Oregon State University for WMT 2019 Machine Translation Robustness Shared Task. Translation of social media is a very challenging problem, since its style is very different from normal parallel corpora (e.g. News) and also include various types of noises. To make it worse, the amount of social media parallel corpora is extremely limited. In this paper, we use a domain sensitive training method which leverages a large amount of parallel data from popular domains together with a little amount of parallel data from social media. Furthermore, we generate a parallel dataset with pseudo noisy source sentences which are back-translated from monolingual data using a model trained by a similar domain sensitive way. We achieve more than 10 BLEU improvement in both En-Fr and Fr-En translation compared with the baseline methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2019

Naver Labs Europe's Systems for the WMT19 Machine Translation Robustness Task

This paper describes the systems that we submitted to the WMT19 Machine ...
research
10/20/2022

The VolcTrans System for WMT22 Multilingual Machine Translation Task

This report describes our VolcTrans system for the WMT22 shared task on ...
research
06/16/2023

Sheffield's Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages

In this paper we describe the University of Sheffield's submission to th...
research
02/25/2019

Improving Robustness of Machine Translation with Synthetic Noise

Modern Machine Translation (MT) systems perform consistently well on cle...
research
09/11/2020

Robust Neural Machine Translation: Modeling Orthographic and Interpunctual Variation

Neural machine translation systems typically are trained on curated corp...
research
07/09/2019

NTT's Machine Translation Systems for WMT19 Robustness Task

This paper describes NTT's submission to the WMT19 robustness task. This...
research
07/10/2019

Lingua Custodia at WMT'19: Attempts to Control Terminology

This paper describes Lingua Custodia's submission to the WMT'19 news sha...

Please sign up or login with your details

Forgot password? Click here to reset