A global analysis of metrics used for measuring performance in natural language processing

04/25/2022
by   Kathrin Blagec, et al.
0

Measuring the performance of natural language processing models is challenging. Traditionally used metrics, such as BLEU and ROUGE, originally devised for machine translation and summarization, have been shown to suffer from low correlation with human judgment and a lack of transferability to other tasks and languages. In the past 15 years, a wide range of alternative metrics have been proposed. However, it is unclear to what extent this has had an impact on NLP benchmarking efforts. Here we provide the first large-scale cross-sectional analysis of metrics used for measuring performance in natural language processing. We curated, mapped and systematized more than 3500 machine learning model performance results from the open repository 'Papers with Code' to enable a global and comprehensive analysis. Our results suggest that the large majority of natural language processing metrics currently used have properties that may result in an inadequate reflection of a models' performance. Furthermore, we found that ambiguities and inconsistencies in the reporting of metrics may lead to difficulties in interpreting and comparing model performances, impairing transparency and reproducibility in NLP research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2020

A critical analysis of metrics used for measuring progress in artificial intelligence

Comparing model performances on benchmark datasets is an integral part o...
research
03/30/2022

Reproducibility Issues for BERT-based Evaluation Metrics

Reproducibility is of utmost concern in machine learning and natural lan...
research
11/18/2020

Inspecting state of the art performance and NLP metrics in image-based medical report generation

Several deep learning architectures have been proposed over the last yea...
research
03/13/2020

Masakhane – Machine Translation For Africa

Africa has over 2000 languages. Despite this, African languages account ...
research
02/06/2022

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

The search for effective and robust generalization metrics has been the ...
research
05/26/2023

NLP Reproducibility For All: Understanding Experiences of Beginners

As natural language processing (NLP) has recently seen an unprecedented ...
research
04/20/2011

Understanding Exhaustive Pattern Learning

Pattern learning in an important problem in Natural Language Processing ...

Please sign up or login with your details

Forgot password? Click here to reset