Detecting Label Errors in Token Classification Data

10/08/2022
by   Wei-Chen Wang, et al.
0

Mislabeled examples are a common issue in real-world data, particularly for tasks like token classification where many labels must be chosen on a fine-grained basis. Here we consider the task of finding sentences that contain label errors in token classification datasets. We study 11 different straightforward methods that score tokens/sentences based on the predicted class probabilities output by a (any) token classification model (trained via any procedure). In precision-recall evaluations based on real-world label errors in entity recognition data from CoNLL-2003, we identify a simple and effective method that consistently detects those sentences containing label errors when applied with different token classification models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2022

Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition

Argument Unit Recognition and Classification aims at identifying argumen...
research
09/02/2023

ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data

Despite powering sensitive systems like autonomous vehicles, object dete...
research
10/09/2020

iobes: A Library for Span-Level Processing

Many tasks in natural language processing, such as named entity recognit...
research
03/12/2022

FiNER: Financial Numeric Entity Recognition for XBRL Tagging

Publicly traded companies are required to submit periodic reports with e...
research
08/22/2020

UTMN at SemEval-2020 Task 11: A Kitchen Solution to Automatic Propaganda Detection

The article describes a fast solution to propaganda detection at SemEval...
research
11/14/2018

Jointly Learning to Label Sentences and Tokens

Learning to construct text representations in end-to-end systems can be ...
research
10/13/2022

Utilizing supervised models to infer consensus labels and their quality from data with multiple annotators

Real-world data for classification is often labeled by multiple annotato...

Please sign up or login with your details

Forgot password? Click here to reset