A Call for Clarity in Reporting BLEU Scores

04/23/2018
by   Matt Post, et al.
0

The field of machine translation is blessed with new challenges resulting from the regular production of fresh test sets in diverse settings. But it is also cursed---with a lack of consensus in how to report scores from its dominant metric. Although people refer to "the" BLEU score, BLEU scores can vary wildly with changes to its parameterization and, especially, reference processing schemes, yet these details are absent from papers or hard to determine. We quantify this variation, finding differences as high as 1.8 between commonly used configurations. Pointing to the success of the parsing community, we suggest machine translation researchers set- tle upon the BLEU scheme used by the annual Conference on Machine Translation (WMT), which does not permit user-supplied preprocessing of the reference. We provide a new tool to facilitate this.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2019

Neural Machine Translation for Cebuano to Tagalog with Subword Unit Translation

The Philippines is an archipelago composed of 7, 641 different islands w...
research
05/29/2021

Grammar Accuracy Evaluation (GAE): Quantifiable Intrinsic Evaluation of Machine Translation Models

Intrinsic evaluation by humans for the performance of natural language g...
research
01/07/2017

Neural Machine Translation on Scarce-Resource Condition: A case-study on Persian-English

Neural Machine Translation (NMT) is a new approach for Machine Translati...
research
06/29/2021

Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers

This paper presents the first large-scale meta-evaluation of machine tra...
research
10/15/2015

Telemedicine as a special case of Machine Translation

Machine translation is evolving quite rapidly in terms of quality. Nowad...
research
06/10/2021

Shades of BLEU, Flavours of Success: The Case of MultiWOZ

The MultiWOZ dataset (Budzianowski et al.,2018) is frequently used for b...
research
04/12/2021

Assessing Reference-Free Peer Evaluation for Machine Translation

Reference-free evaluation has the potential to make machine translation ...

Please sign up or login with your details

Forgot password? Click here to reset