Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric

09/27/2022
by   Giorgos Vernikos, et al.
0

We hypothesize that existing sentence-level machine translation (MT) metrics become less effective when the human reference contains ambiguities. To verify this hypothesis, we present a very simple method for extending pretrained metrics to incorporate context at the document level. We apply our method to three popular metrics, BERTScore, Prism, and COMET, and to the reference free metric COMET-QE. We evaluate the extended metrics on the WMT 2021 metrics shared task using the provided MQM annotations. Our results show that the extended metrics outperform their sentence-level counterparts in about 85 the tested conditions, when excluding results on low-quality human references. Additionally, we show that our document-level extension of COMET-QE dramatically improves its accuracy on discourse phenomena tasks, outperforming a dedicated baseline by up to 6.1 initial hypothesis and show that a simple extension of the metrics permits them to take advantage of context to resolve ambiguities in the reference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2023

SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window

Reference-based metrics that operate at the sentence level typically out...
research
01/21/2023

Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference

Machine translation quality estimation (QE) predicts human judgements of...
research
10/27/2022

ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics

As machine translation (MT) metrics improve their correlation with human...
research
10/29/2020

Unbabel's Participation in the WMT20 Metrics Shared Task

We present the contribution of the Unbabel team to the WMT 2020 Shared T...
research
01/30/2019

Reference-less Quality Estimation of Text Simplification Systems

The evaluation of text simplification (TS) systems remains an open chall...
research
01/30/2023

KG-BERTScore: Incorporating Knowledge Graph into BERTScore for Reference-Free Machine Translation Evaluation

BERTScore is an effective and robust automatic metric for referencebased...
research
05/30/2023

Breeding Machine Translations: Evolutionary approach to survive and thrive in the world of automated evaluation

We propose a genetic algorithm (GA) based method for modifying n-best li...

Please sign up or login with your details

Forgot password? Click here to reset