Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance

05/04/2022
by   Jingwei Ni, et al.
1

Human-translated text displays distinct features from naturally written text in the same language. This phenomena, known as translationese, has been argued to confound the machine translation (MT) evaluation. Yet, we find that existing work on translationese neglects some important factors and the conclusions are mostly correlational but not causal. In this work, we collect CausalMT, a dataset where the MT training data are also labeled with the human translation directions. We inspect two critical factors, the train-test direction match (whether the human translation directions in the training and test sets are aligned), and data-model direction match (whether the model learns in the same direction as the human translation direction in the dataset). We show that these two factors have a large causal effect on the MT performance, in addition to the test-model direction mismatch highlighted by existing work on the impact of translationese. In light of our findings, we provide a set of suggestions for MT training and evaluation. Our code and data are at https://github.com/EdisonNi-hku/CausalMT

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2019

The Effect of Translationese in Machine Translation Test Sets

The effect of translationese has been studied in the field of machine tr...
research
11/07/2021

Variance-Aware Machine Translation Test Sets

We release 70 small and discriminative test sets for machine translation...
research
10/12/2020

It's not a Non-Issue: Negation as a Source of Error in Machine Translation

As machine translation (MT) systems progress at a rapid pace, questions ...
research
06/24/2019

Translationese in Machine Translation Evaluation

The term translationese has been used to describe the presence of unusua...
research
10/20/2014

Using Mechanical Turk to Build Machine Translation Evaluation Sets

Building machine translation (MT) test sets is a relatively expensive ta...
research
10/24/2021

Understanding the Impact of UGC Specificities on Translation Quality

This work takes a critical look at the evaluation of user-generated cont...
research
10/13/2021

Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits

Training data for machine translation (MT) is often sourced from a multi...

Please sign up or login with your details

Forgot password? Click here to reset