Findings of the First Shared Task on Machine Translation Robustness

by   Xian Li, et al.

We share the findings of the first shared task on improving robustness of Machine Translation (MT). The task provides a testbed representing challenges facing MT models deployed in the real world, and facilitates new approaches to improve models; robustness to noisy input and domain mismatch. We focus on two language pairs (English-French and English-Japanese), and the submitted systems are evaluated on a blind test set consisting of noisy comments on Reddit and professionally sourced translations. As a new task, we received 23 submissions by 11 participating teams from universities, companies, national labs, etc. All submitted systems achieved large improvements over baselines, with the best improvement having +22.33 BLEU. We evaluated submissions by both human judgment and automatic evaluation (BLEU), which shows high correlations (Pearson's r = 0.94 and 0.95). Furthermore, we conducted a qualitative analysis of the submitted systems using compare-mt, which revealed their salient differences in handling challenges in this task. Such analysis provides additional insights when there is occasional disagreement between human judgment and BLEU, e.g. systems better at producing colloquial expressions received higher score from human judgment.


Naver Labs Europe's Systems for the WMT19 Machine Translation Robustness Task

This paper describes the systems that we submitted to the WMT19 Machine ...

MTNT: A Testbed for Machine Translation of Noisy Text

Noisy or non-standard input text can cause disastrous mistranslations in...

Findings of the WMT 2022 Shared Task on Translation Suggestion

We report the result of the first edition of the WMT shared task on Tran...

Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness

We share a French-English parallel corpus of Foursquare restaurant revie...

Explicit Representation of the Translation Space: Automatic Paraphrasing for Machine Translation Evaluation

Following previous work on automatic paraphrasing, we assess the feasibi...

Machine Translation of Novels in the Age of Transformer

In this chapter we build a machine translation (MT) system tailored to t...

With a Little Help from the Authors: Reproducing Human Evaluation of an MT Error Detector

This work presents our efforts to reproduce the results of the human eva...

Please sign up or login with your details

Forgot password? Click here to reset