In Defense of Core-set: A Density-aware Core-set Selection for Active Learning

06/10/2022
by   Yeachan Kim, et al.
0

Active learning enables the efficient construction of a labeled dataset by labeling informative samples from an unlabeled dataset. In a real-world active learning scenario, considering the diversity of the selected samples is crucial because many redundant or highly similar samples exist. Core-set approach is the promising diversity-based method selecting diverse samples based on the distance between samples. However, the approach poorly performs compared to the uncertainty-based approaches that select the most difficult samples where neural models reveal low confidence. In this work, we analyze the feature space through the lens of the density and, interestingly, observe that locally sparse regions tend to have more informative samples than dense regions. Motivated by our analysis, we empower the core-set approach with the density-awareness and propose a density-aware core-set (DACS). The strategy is to estimate the density of the unlabeled samples and select diverse samples mainly from sparse regions. To reduce the computational bottlenecks in estimating the density, we also introduce a new density approximation based on locality-sensitive hashing. Experimental results clearly demonstrate the efficacy of DACS in both classification and regression tasks and specifically show that DACS can produce state-of-the-art performance in a practical scenario. Since DACS is weakly dependent on neural architectures, we present a simple yet effective combination method to show that the existing methods can be beneficially combined with DACS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2022

Exploiting Diversity of Unlabeled Data for Label-Efficient Semi-Supervised Active Learning

The availability of large labeled datasets is the key component for the ...
research
07/17/2023

Active Learning for Object Detection with Non-Redundant Informative Sampling

Curating an informative and representative dataset is essential for enha...
research
05/12/2018

Pool-Based Sequential Active Learning for Regression

Active learning is a machine learning approach for reducing the data lab...
research
08/10/2023

Composable Core-sets for Diversity Approximation on Multi-Dataset Streams

Core-sets refer to subsets of data that maximize some function that is c...
research
10/24/2019

Preventing Adversarial Use of Datasets through Fair Core-Set Construction

We propose improving the privacy properties of a dataset by publishing o...
research
05/06/2021

Bayesian Active Learning by Disagreements: A Geometric Perspective

We present geometric Bayesian active learning by disagreements (GBALD), ...
research
05/24/2021

Mapping oil palm density at country scale: An active learning approach

Accurate mapping of oil palm is important for understanding its past and...

Please sign up or login with your details

Forgot password? Click here to reset