VaB-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning

by   Jongwon Choi, et al.

Active Learning for discriminative models has largely been studied with the focus on individual samples, with less emphasis on how classes are distributed or which classes are hard to deal with. In this work, we show that this is harmful. We propose a method based on the Bayes' rule, that can naturally incorporate class imbalance into the Active Learning framework. We derive that three terms should be considered together when estimating the probability of a classifier making a mistake for a given sample; i) probability of mislabelling a class, ii) likelihood of the data given a predicted class, and iii) the prior probability on the abundance of a predicted class. Implementing these terms requires a generative model and an intractable likelihood estimation. Therefore, we train a Variational Auto Encoder (VAE) for this purpose. To further tie the VAE with the classifier and facilitate VAE training, we use the classifiers' deep feature representations as input to the VAE. By considering all three probabilities, among them especially the data imbalance, we can substantially improve the potential of existing methods under limited data budget. We show that our method can be applied to classification tasks on multiple different datasets – including one that is a real-world dataset with heavy data imbalance – significantly outperforming the state of the art.


page 1

page 2

page 3

page 4


SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios

Active learning has proven to be useful for minimizing labeling costs by...

GALAXY: Graph-based Active Learning at the Extreme

Active learning is a label-efficient approach to train highly effective ...

CLINICAL: Targeted Active Learning for Imbalanced Medical Image Classification

Training deep learning models on medical datasets that perform well for ...

Class-Specific Variational Auto-Encoder for Content-Based Image Retrieval

Using a discriminative representation obtained by supervised deep learni...

Analysis of Driving Scenario Trajectories with Active Learning

Annotating the driving scenario trajectories based only on explicit rule...

Data augmentation on-the-fly and active learning in data stream classification

There is an emerging need for predictive models to be trained on-the-fly...

Pulsar Candidate Identification with Artificial Intelligence Techniques

Discovering pulsars is a significant and meaningful research topic in th...

Please sign up or login with your details

Forgot password? Click here to reset