BASIL: Balanced Active Semi-supervised Learning for Class Imbalanced Datasets

by   Suraj Kothawade, et al.

Current semi-supervised learning (SSL) methods assume a balance between the number of data points available for each class in both the labeled and the unlabeled data sets. However, there naturally exists a class imbalance in most real-world datasets. It is known that training models on such imbalanced datasets leads to biased models, which in turn lead to biased predictions towards the more frequent classes. This issue is further pronounced in SSL methods, as they would use this biased model to obtain psuedo-labels (on the unlabeled data) during training. In this paper, we tackle this problem by attempting to select a balanced labeled dataset for SSL that would result in an unbiased model. Unfortunately, acquiring a balanced labeled dataset from a class imbalanced distribution in one shot is challenging. We propose BASIL (Balanced Active Semi-supervIsed Learning), a novel algorithm that optimizes the submodular mutual information (SMI) functions in a per-class fashion to gradually select a balanced dataset in an active learning loop. Importantly, our technique can be efficiently used to improve the performance of any SSL method. Our experiments on Path-MNIST and Organ-MNIST medical datasets for a wide array of SSL methods show the effectiveness of Basil. Furthermore, we observe that Basil outperforms the state-of-the-art diversity and uncertainty based active learning methods since the SMI functions select a more balanced dataset.


page 1

page 2

page 3

page 4


ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

Existing semi-supervised learning (SSL) algorithms typically assume clas...

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

While semi-supervised learning (SSL) has proven to be a promising way fo...

BMB: Balanced Memory Bank for Imbalanced Semi-supervised Learning

Exploring a substantial amount of unlabeled data, semi-supervised learni...

On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks

Many pairwise classification tasks, such as paraphrase detection and ope...

Multi-Centroid Hyperdimensional Computing Approach for Epileptic Seizure Detection

Long-term monitoring of patients with epilepsy presents a challenging pr...

Land Cover and Land Use Detection using Semi-Supervised Learning

Semi-supervised learning (SSL) has made significant strides in the field...

PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information

Few-shot classification (FSC) requires training models using a few (typi...

Please sign up or login with your details

Forgot password? Click here to reset