Learning From Long-Tailed Data With Noisy Labels

08/25/2021
by   Shyamgopal Karthik, et al.
0

Class imbalance and noisy labels are the norm rather than the exception in many large-scale classification datasets. Nevertheless, most works in machine learning typically assume balanced and clean data. There have been some recent attempts to tackle, on one side, the problem of learning from noisy labels and, on the other side, learning from long-tailed data. Each group of methods make simplifying assumptions about the other. Due to this separation, the proposed solutions often underperform when both assumptions are violated. In this work, we present a simple two-stage approach based on recent advances in self-supervised learning to treat both challenges simultaneously. It consists of, first, task-agnostic self-supervised pre-training, followed by task-specific fine-tuning using an appropriate loss. Most significantly, we find that self-supervised learning approaches are effectively able to cope with severe class imbalance. In addition, the resulting learned representations are also remarkably robust to label noise, when fine-tuned with an imbalance- and noise-resistant loss function. We validate our claims with experiments on CIFAR-10 and CIFAR-100 augmented with synthetic imbalance and noise, as well as the large-scale inherently noisy Clothing-1M dataset.

READ FULL TEXT
research
10/11/2021

Self-supervised Learning is More Robust to Dataset Imbalance

Self-supervised learning (SSL) is a scalable way to learn general visual...
research
07/13/2022

Task Agnostic Representation Consolidation: a Self-supervised based Continual Learning Approach

Continual learning (CL) over non-stationary data streams remains one of ...
research
07/23/2022

Self-Supervised Learning of Echocardiogram Videos Enables Data-Efficient Clinical Diagnosis

Given the difficulty of obtaining high-quality labels for medical image ...
research
02/06/2023

APAM: Adaptive Pre-training and Adaptive Meta Learning in Language Model for Noisy Labels and Long-tailed Learning

Practical natural language processing (NLP) tasks are commonly long-tail...
research
02/11/2022

Investigating Power laws in Deep Representation Learning

Representation learning that leverages large-scale labelled datasets, is...
research
12/22/2022

Offline Clustering Approach to Self-supervised Learning for Class-imbalanced Image Data

Class-imbalanced datasets are known to cause the problem of model being ...

Please sign up or login with your details

Forgot password? Click here to reset