Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET

02/10/2022
by   Chantal Amrhein, et al.
0

Neural metrics have achieved impressive correlation with human judgements in the evaluation of machine translation systems, but before we can safely optimise towards such metrics, we should be aware of (and ideally eliminate) biases towards bad translations that receive high scores. Our experiments show that sample-based Minimum Bayes Risk decoding can be used to explore and quantify such weaknesses. When applying this strategy to COMET for en-de and de-en, we find that COMET models are not sensitive enough to discrepancies in numbers and named entities. We further show that these biases cannot be fully removed by simply training on additional synthetic data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2021

Minimum Bayes Risk Decoding with Neural Metrics of Translation Quality

This work applies Minimum Bayes Risk (MBR) decoding to optimize diverse ...
research
05/18/2021

Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation

Neural Machine Translation (NMT) currently exhibits biases such as produ...
research
07/06/2023

BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training

Automatic metrics play a crucial role in machine translation. Despite th...
research
04/11/2017

Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation

For extended periods of time, sequence generation models rely on beam se...
research
12/20/2022

Extrinsic Evaluation of Machine Translation Metrics

Automatic machine translation (MT) metrics are widely used to distinguis...
research
09/07/2022

Adam Mickiewicz University at WMT 2022: NER-Assisted and Quality-Aware Neural Machine Translation

This paper presents Adam Mickiewicz University's (AMU) submissions to th...
research
10/01/2022

FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

We present FRMT, a new dataset and evaluation benchmark for Few-shot Reg...

Please sign up or login with your details

Forgot password? Click here to reset