BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation

05/30/2023
by   Taisiya Glushkova, et al.
0

Although neural-based machine translation evaluation metrics, such as COMET or BLEURT, have achieved strong correlations with human judgements, they are sometimes unreliable in detecting certain phenomena that can be considered as critical errors, such as deviations in entities and numbers. In contrast, traditional evaluation metrics, such as BLEU or chrF, which measure lexical or character overlap between translation hypotheses and human references, have lower correlations with human judgements but are sensitive to such deviations. In this paper, we investigate several ways of combining the two approaches in order to increase robustness of state-of-the-art evaluation methods to translations with critical errors. We show that by using additional information during training, such as sentence-level features and word-level tags, the trained metrics improve their capability to penalize translations with specific troublesome phenomena, which leads to gains in correlation with human judgments and on recent challenge sets on several language pairs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics

Neural metrics for machine translation evaluation, such as COMET, exhibi...
research
10/25/2022

DEMETR: Diagnosing Evaluation Metrics for Translation

While machine translation evaluation metrics based on string overlap (e....
research
08/25/2023

Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level

As research on machine translation moves to translating text beyond the ...
research
08/10/2015

Improve the Evaluation of Fluency Using Entropy for Machine Translation Evaluation Metrics

The widely-used automatic evaluation metrics cannot adequately reflect t...
research
09/23/2020

KoBE: Knowledge-Based Machine Translation Evaluation

We propose a simple and effective method for machine translation evaluat...
research
04/15/2021

Rethinking Automatic Evaluation in Sentence Simplification

Automatic evaluation remains an open research question in Natural Langua...
research
04/22/2017

Lexical Features in Coreference Resolution: To be Used With Caution

Lexical features are a major source of information in state-of-the-art c...

Please sign up or login with your details

Forgot password? Click here to reset