Uniform versus uncertainty sampling: When being active is less efficient than staying passive

12/01/2022
by   Alexandru Tifrea, et al.
0

It is widely believed that given the same labeling budget, active learning algorithms like uncertainty sampling achieve better predictive performance than passive learning (i.e. uniform sampling), albeit at a higher computational cost. Recent empirical evidence suggests that this added cost might be in vain, as uncertainty sampling can sometimes perform even worse than passive learning. While existing works offer different explanations in the low-dimensional regime, this paper shows that the underlying mechanism is entirely different in high dimensions: we prove for logistic regression that passive learning outperforms uncertainty sampling even for noiseless data and when using the uncertainty of the Bayes optimal classifier. Insights from our proof indicate that this high-dimensional phenomenon is exacerbated when the separation between the classes is small. We corroborate this intuition with experiments on 20 high-dimensional datasets spanning a diverse range of applications, from finance and histology to chemistry and computer vision.

READ FULL TEXT

page 10

page 11

page 12

page 41

page 42

research
05/27/2023

USIM-DAL: Uncertainty-aware Statistical Image Modeling-based Dense Active Learning for Super-resolution

Dense regression is a widely used approach in computer vision for tasks ...
research
06/15/2018

On the Relationship between Data Efficiency and Error for Uncertainty Sampling

While active learning offers potential cost savings, the actual data eff...
research
10/16/2021

Nuances in Margin Conditions Determine Gains in Active Learning

We consider nonparametric classification with smooth regression function...
research
04/17/2020

Active Sentence Learning by Adversarial Uncertainty Sampling in Discrete Space

In this paper, we focus on reducing the labeled data size for sentence l...
research
06/09/2022

ScatterSample: Diversified Label Sampling for Data Efficient Graph Neural Network Learning

What target labels are most effective for graph neural network (GNN) tra...
research
02/07/2022

Theoretical characterization of uncertainty in high-dimensional linear classification

Being able to reliably assess not only the accuracy but also the uncerta...
research
02/28/2022

Evaluating High-Order Predictive Distributions in Deep Learning

Most work on supervised learning research has focused on marginal predic...

Please sign up or login with your details

Forgot password? Click here to reset