Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point

08/13/2018
by   Liane Guillou, et al.
0

We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semi-automatic metrics and test suites in place of fully automatic metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2021

Minimum Bayes Risk Decoding with Neural Metrics of Translation Quality

This work applies Minimum Bayes Risk (MBR) decoding to optimize diverse ...
research
11/20/2008

chi2TeX Semi-automatic translation from chiwriter to LaTeX

Semi-automatic translation of math-filled book from obsolete ChiWriter f...
research
04/13/2020

BLEU might be Guilty but References are not Innocent

The quality of automatic metrics for machine translation has been increa...
research
05/22/2023

Improving Metrics for Speech Translation

We introduce Parallel Paraphrasing (Para_both), an augmentation method f...
research
08/13/2019

EASSE: Easier Automatic Sentence Simplification Evaluation

We introduce EASSE, a Python package aiming to facilitate and standardis...
research
03/29/2022

Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics

Current practices in metric evaluation focus on one single dataset, e.g....
research
03/25/2022

Automatic Song Translation for Tonal Languages

This paper develops automatic song translation (AST) for tonal languages...

Please sign up or login with your details

Forgot password? Click here to reset