Interpretable Multi-dataset Evaluation for Named Entity Recognition

11/13/2020
by   Jinlan Fu, et al.
0

With the proliferation of models for natural language processing tasks, it is even harder to understand the differences between models and their relative merits. Simply looking at differences between holistic metrics such as accuracy, BLEU, or F1 does not tell us why or how particular methods perform differently and how diverse datasets influence the model design choices. In this paper, we present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them, identifying the strengths and weaknesses of current systems. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area: https://github.com/neulab/InterpretEval.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

TweetNERD – End to End Entity Linking Benchmark for Tweets

Named Entity Recognition and Disambiguation (NERD) systems are foundatio...
research
04/16/2020

CrossCheck: Rapid, Reproducible, and Interpretable Model Evaluation

Evaluation beyond aggregate performance metrics, e.g. F1-score, is cruci...
research
04/13/2021

EXPLAINABOARD: An Explainable Leaderboard for NLP

With the rapid development of NLP research, leaderboards have emerged as...
research
01/12/2020

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study

While neural network-based models have achieved impressive performance o...
research
07/29/2021

Addressing Barriers to Reproducible Named Entity Recognition Evaluation

To address what we believe is a looming crisis of unreproducible evaluat...
research
02/09/2023

A Novel Approach for Auto-Formulation of Optimization Problems

In the Natural Language for Optimization (NL4Opt) NeurIPS 2022 competiti...
research
12/19/2022

Statistical Dataset Evaluation: Reliability, Difficulty, and Validity

Datasets serve as crucial training resources and model performance track...

Please sign up or login with your details

Forgot password? Click here to reset