PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

11/04/2020
by   Ryo Fujii, et al.
0

Neural Machine Translation (NMT) has shown drastic improvement in its quality when translating clean input, such as text from the news domain. However, existing studies suggest that NMT still struggles with certain kinds of input with considerable noise, such as User-Generated Contents (UGC) on the Internet. To make better use of NMT for cross-cultural communication, one of the most promising directions is to develop a model that correctly handles these expressions. Though its importance has been recognized, it is still not clear as to what creates the great gap in performance between the translation of clean input and that of UGC. To answer the question, we present a new dataset, PheMT, for evaluating the robustness of MT systems against specific linguistic phenomena in Japanese-English translation. Our experiments with the created dataset revealed that not only our in-house models but even widely used off-the-shelf systems are greatly disturbed by the presence of certain phenomena.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2023

Prompting Neural Machine Translation with Translation Memories

Improving machine translation (MT) systems with translation memories (TM...
research
12/14/2016

How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs

Analysing translation quality in regards to specific linguistic phenomen...
research
10/07/2019

Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back Translation

Neural Machine Translation (NMT) models have been proved strong when tra...
research
02/25/2019

Improving Robustness of Machine Translation with Synthetic Noise

Modern Machine Translation (MT) systems perform consistently well on cle...
research
10/15/2018

Robust Neural Machine Translation with Joint Textual and Phonetic Embedding

Neural machine translation (NMT) is notoriously sensitive to noises, but...
research
05/31/2021

On Compositional Generalization of Neural Machine Translation

Modern neural machine translation (NMT) models have achieved competitive...
research
04/20/2021

Addressing the Vulnerability of NMT in Input Perturbations

Neural Machine Translation (NMT) has achieved significant breakthrough i...

Please sign up or login with your details

Forgot password? Click here to reset