EXPLAINABOARD: An Explainable Leaderboard for NLP

04/13/2021
by   Pengfei Liu, et al.
0

With the rapid development of NLP research, leaderboards have emerged as one tool to track the performance of various systems on various NLP tasks. They are effective in this goal to some extent, but generally present a rather simplistic one-dimensional view of the submitted systems, communicated only through holistic accuracy numbers. In this paper, we present a new conceptualization and implementation of NLP evaluation: the ExplainaBoard, which in addition to inheriting the functionality of the standard leaderboard, also allows researchers to (i) diagnose strengths and weaknesses of a single system (e.g. what is the best-performing system bad at?) (ii) interpret relationships between multiple systems. (e.g. where does system A outperform system B? What if we combine systems A, B, C?) and (iii) examine prediction results closely (e.g. what are common errors made by multiple systems or and in what contexts do particular errors occur?). ExplainaBoard has been deployed at <http://explainaboard.nlpedia.ai/>, and we have additionally released our interpretable evaluation code at <https://github.com/neulab/ExplainaBoard> and output files from more than 300 systems, 40 datasets, and 9 tasks to motivate the "output-driven" research in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Towards More Fine-grained and Reliable NLP Performance Prediction

Performance prediction, the task of estimating a system's performance wi...
research
11/13/2020

Interpretable Multi-dataset Evaluation for Named Entity Recognition

With the proliferation of models for natural language processing tasks, ...
research
03/02/2022

Mukayese: Turkish NLP Strikes Back

Having sufficient resources for language X lifts it from the under-resou...
research
05/09/2023

Beyond Good Intentions: Reporting the Research Landscape of NLP for Social Good

With the recent advances in natural language processing (NLP), a vast nu...
research
05/03/2020

How Does Selective Mechanism Improve Self-Attention Networks?

Self-attention networks (SANs) with selective mechanism has produced sub...
research
03/06/2023

OpenICL: An Open-Source Framework for In-context Learning

In recent years, In-context Learning (ICL) has gained increasing attenti...
research
03/24/2021

Representing Numbers in NLP: a Survey and a Vision

NLP systems rarely give special consideration to numbers found in text. ...

Please sign up or login with your details

Forgot password? Click here to reset