Measuring Uncertainty in Translation Quality Evaluation (TQE)

11/15/2021
by   Serge Gladkoff, et al.
0

From both human translators (HT) and machine translation (MT) researchers' point of view, translation quality evaluation (TQE) is an essential task. Translation service providers (TSPs) have to deliver large volumes of translations which meet customer specifications with harsh constraints of required quality level in tight time-frames and costs. MT researchers strive to make their models better, which also requires reliable quality evaluation. While automatic machine translation evaluation (MTE) metrics and quality estimation (QE) tools are widely available and easy to access, existing automated tools are not good enough, and human assessment from professional translators (HAP) are often chosen as the golden standard <cit.>. Human evaluations, however, are often accused of having low reliability and agreement. Is this caused by subjectivity or statistics is at play? How to avoid the entire text to be checked and be more efficient with TQE from cost and efficiency perspectives, and what is the optimal sample size of the translated text, so as to reliably estimate the translation quality of the entire material? This work carries out such motivated research to correctly estimate the confidence intervals <cit.> depending on the sample size of the translated text, e.g. the amount of words or sentences, that needs to be processed on TQE workflow step for confident and reliable evaluation of overall translation quality. The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

Uncertainty-Aware Machine Translation Evaluation

Several neural-based metrics have been recently proposed to evaluate mac...
research
03/14/2022

A Bayesian approach to translators' reliability assessment

Translation Quality Assessment (TQA) conducted by human translators is a...
research
06/24/2019

Translationese in Machine Translation Evaluation

The term translationese has been used to describe the presence of unusua...
research
12/27/2021

HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation

Traditional automatic evaluation metrics for machine translation have be...
research
12/13/2022

Airline service quality ranking based on combined TOPSIS-VIKOR-AISM model

The service quality ranking of airlines is a crucial factor for their su...
research
02/22/2022

An Overview on Machine Translation Evaluation

Since the 1950s, machine translation (MT) has become one of the importan...
research
04/13/2022

Better Uncertainty Quantification for Machine Translation Evaluation

Neural-based machine translation (MT) evaluation metrics are progressing...

Please sign up or login with your details

Forgot password? Click here to reset