A Call for Clarity in Reporting BLEU Scores

04/23/2018
by   Matt Post, et al.
0

The field of machine translation is blessed with new challenges resulting from the regular production of fresh test sets in diverse settings. But it is also cursed---with a lack of consensus in how to report scores from its dominant metric. Although people refer to "the" BLEU score, BLEU scores can vary wildly with changes to its parameterization and, especially, reference processing schemes, yet these details are absent from papers or hard to determine. We quantify this variation, finding differences as high as 1.8 between commonly used configurations. Pointing to the success of the parsing community, we suggest machine translation researchers set- tle upon the BLEU scheme used by the annual Conference on Machine Translation (WMT), which does not permit user-supplied preprocessing of the reference. We provide a new tool to facilitate this.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/10/2019

Neural Machine Translation for Cebuano to Tagalog with Subword Unit Translation

The Philippines is an archipelago composed of 7, 641 different islands w...
05/29/2021

Grammar Accuracy Evaluation (GAE): Quantifiable Intrinsic Evaluation of Machine Translation Models

Intrinsic evaluation by humans for the performance of natural language g...
01/07/2017

Neural Machine Translation on Scarce-Resource Condition: A case-study on Persian-English

Neural Machine Translation (NMT) is a new approach for Machine Translati...
06/29/2021

Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers

This paper presents the first large-scale meta-evaluation of machine tra...
10/15/2015

Telemedicine as a special case of Machine Translation

Machine translation is evolving quite rapidly in terms of quality. Nowad...
06/10/2021

Shades of BLEU, Flavours of Success: The Case of MultiWOZ

The MultiWOZ dataset (Budzianowski et al.,2018) is frequently used for b...
04/12/2021

Assessing Reference-Free Peer Evaluation for Machine Translation

Reference-free evaluation has the potential to make machine translation ...

Please sign up or login with your details

Forgot password? Click here to reset