GALAXY: Graph-based Active Learning at the Extreme

02/03/2022
by   Jifan Zhang, et al.
0

Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In "open world" settings, the classes of interest can make up a small fraction of the overall dataset – most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and our theory and methods focus on this core issue. We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. GALAXY automatically and adaptively selects more class-balanced examples for labeling than most other methods for active learning. Our theory shows that GALAXY performs a refined form of uncertainty sampling that gathers a much more class-balanced dataset than vanilla uncertainty sampling. Experimentally, we demonstrate GALAXY's superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2021

Class-Balanced Active Learning for Image Classification

Active learning aims to reduce the labeling effort that is required to t...
research
06/29/2015

S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification

This paper investigates the problem of active learning for binary label ...
research
10/27/2022

Poisson Reweighted Laplacian Uncertainty Sampling for Graph-based Active Learning

We show that uncertainty sampling is sufficient to achieve exploration v...
research
03/25/2020

VaB-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning

Active Learning for discriminative models has largely been studied with ...
research
06/20/2019

Regional based query in graph active learning

Graph convolution networks (GCN) have emerged as the leading method to c...
research
07/16/2020

Active Learning under Label Shift

Distribution shift poses a challenge for active data collection in the r...
research
06/30/2022

Data-Efficient Learning via Minimizing Hyperspherical Energy

Deep learning on large-scale data is dominant nowadays. The unprecedente...

Please sign up or login with your details

Forgot password? Click here to reset