DeepAI AI Chat
Log In Sign Up

Addressing Barriers to Reproducible Named Entity Recognition Evaluation

by   Chester Palen-Michel, et al.

To address what we believe is a looming crisis of unreproducible evaluation for named entity recognition tasks, we present guidelines for reproducible evaluation. The guidelines we propose are extremely simple, focusing on transparency regarding how chunks are encoded and scored, but very few papers currently being published fully comply with them. We demonstrate that despite the apparent simplicity of NER evaluation, unreported differences in the scoring procedure can result in changes to scores that are both of noticeable magnitude and are statistically significant. We provide SeqScore, an open source toolkit that addresses many of the issues that cause replication failures and makes following our guidelines easy.


page 1

page 2

page 3

page 4


NeuroNER: an easy-to-use program for named-entity recognition based on neural networks

Named-entity recognition (NER) aims at identifying entities of interest ...

KazNERD: Kazakh Named Entity Recognition Dataset

We present the development of a dataset for Kazakh named entity recognit...

SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue Systems

In dialogue systems, the tasks of named entity recognition (NER) and nam...

Interpretable Multi-dataset Evaluation for Named Entity Recognition

With the proliferation of models for natural language processing tasks, ...

Few-shot Named Entity Recognition with Entity-level Prototypical Network Enhanced by Dispersedly Distributed Prototypes

Few-shot named entity recognition (NER) enables us to build a NER system...

Statistical Dataset Evaluation: Reliability, Difficulty, and Validity

Datasets serve as crucial training resources and model performance track...