A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

04/03/2020
by   Samuel Läubli, et al.
2

The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design - which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2018

Achieving Human Parity on Automatic Chinese to English News Translation

Machine translation has made rapid advances in recent years. Millions of...
research
08/30/2018

Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation

We reassess a recent study (Hassan et al., 2018) that claimed that machi...
research
08/21/2018

Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

Recent research suggests that neural machine translation achieves parity...
research
05/12/2020

Reassessing Claims of Human Parity and Super-Human Performance in Machine Translation at WMT 2019

We reassess the claims of human parity and super-human performance made ...
research
03/31/2020

On the Integration of LinguisticFeatures into Statistical and Neural Machine Translation

New machine translations (MT) technologies are emerging rapidly and with...
research
10/01/2022

FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

We present FRMT, a new dataset and evaluation benchmark for Few-shot Reg...
research
08/02/2023

Optimizing Machine Translation through Prompt Engineering: An Investigation into ChatGPT's Customizability

This paper explores the influence of integrating the purpose of the tran...

Please sign up or login with your details

Forgot password? Click here to reset