Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

10/13/2020
by   Clemens-Alexander Brust, et al.
0

Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points.

READ FULL TEXT
research
02/16/2020

Multi-Class Classification from Noisy-Similarity-Labeled Data

A similarity label indicates whether two instances belong to the same cl...
research
06/14/2020

Class2Simi: A New Perspective on Learning with Label Noise

Label noise is ubiquitous in the era of big data. Deep learning algorith...
research
10/26/2021

Addressing out-of-distribution label noise in webly-labelled data

A recurring focus of the deep learning community is towards reducing the...
research
12/04/2019

Epoch-wise label attacks for robustness against label noise

The current accessibility to large medical datasets for training convolu...
research
02/20/2019

Learning with Inadequate and Incorrect Supervision

Practically, we are often in the dilemma that the labeled data at hand a...
research
08/03/2022

Noise tolerance of learning to rank under class-conditional label noise

Often, the data used to train ranking models is subject to label noise. ...
research
01/10/2022

The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence

As the production of and reliance on datasets to produce automated decis...

Please sign up or login with your details

Forgot password? Click here to reset