Analysing the Noise Model Error for Realistic Noisy Label Data

01/24/2021
by   Michael A. Hedderich, et al.
0

Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors. A popular technique to overcome the negative effects of these noisy labels is noise modelling where the underlying noise process is modelled. In this work, we study the quality of these estimated noise models from the theoretical side by deriving the expected error of the noise model. Apart from evaluating the theoretical results on commonly used synthetic noise, we also publish NoisyNER, a new noisy label dataset from the NLP domain that was obtained through a realistic distant supervision technique. It provides seven sets of labels with differing noise patterns to evaluate different noise levels on the same instances. Parallel, clean labels are available making it possible to study scenarios where a small amount of gold-standard data can be leveraged. Our theoretical results and the corresponding experiments give insights into the factors that influence the noise model estimation like the noise distribution and the sampling technique.

READ FULL TEXT

page 8

page 13

page 20

research
04/20/2022

Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

Incorrect labels in training data occur when human annotators make mista...
research
03/23/2020

Label Noise Types and Their Effects on Deep Learning

The recent success of deep learning is mostly due to the availability of...
research
06/03/2022

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-...
research
10/13/2019

What happens when self-supervision meets Noisy Labels?

The major driving force behind the immense success of deep learning mode...
research
02/01/2021

Learning to Combat Noisy Labels via Classification Margins

A deep neural network trained on noisy labels is known to quickly lose i...
research
11/24/2022

Lifting Weak Supervision To Structured Prediction

Weak supervision (WS) is a rich set of techniques that produce pseudolab...
research
01/04/2016

Distant IE by Bootstrapping Using Lists and Document Structure

Distant labeling for information extraction (IE) suffers from noisy trai...

Please sign up or login with your details

Forgot password? Click here to reset