How not to Lie with a Benchmark: Rearranging NLP Leaderboards

12/02/2021
by   Shavrina Tatiana, et al.
0

Comparison with a human is an essential requirement for a benchmark for it to be a reliable measurement of model capabilities. Nevertheless, the methods for model comparison could have a fundamental flaw - the arithmetic mean of separate metrics is used for all tasks of different complexity, different size of test and training sets. In this paper, we examine popular NLP benchmarks' overall scoring methods and rearrange the models by geometric and harmonic mean (appropriate for averaging rates) according to their reported results. We analyze several popular benchmarks including GLUE, SuperGLUE, XGLUE, and XTREME. The analysis shows that e.g. human level on SuperGLUE is still not reached, and there is still room for improvement for the current models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2018

Geometric mean extension for data sets with zeros

There are numerous examples in different research fields where the use o...
research
01/21/2023

Pythagorean Centrality for Data Selection

This paper provides an overview of the Pythagorean centrality measures, ...
research
03/18/2023

An Empirical Study of Pre-trained Language Models in Simple Knowledge Graph Question Answering

Large-scale pre-trained language models (PLMs) such as BERT have recentl...
research
02/18/2022

Geometric representation of the weighted harmonic mean of n positive values and potential uses

This paper is dedicated to the analysis and detailed study of a procedur...
research
11/08/2019

ERASER: A Benchmark to Evaluate Rationalized NLP Models

State-of-the-art models in NLP are now predominantly based on deep neura...
research
01/30/2023

LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain

Lately, propelled by the phenomenal advances around the transformer arch...
research
10/13/2020

With Little Power Comes Great Responsibility

Despite its importance to experimental design, statistical power (the pr...

Please sign up or login with your details

Forgot password? Click here to reset