On the Marginal Benefit of Active Learning: Does Self-Supervision Eat Its Cake?

11/16/2020
by   Yao-Chun Chan, et al.
0

Active learning is the set of techniques for intelligently labeling large unlabeled datasets to reduce the labeling effort. In parallel, recent developments in self-supervised and semi-supervised learning (S4L) provide powerful techniques, based on data-augmentation, contrastive learning, and self-training, that enable superior utilization of unlabeled data which led to a significant reduction in required labeling in the standard machine learning benchmarks. A natural question is whether these paradigms can be unified to obtain superior results. To this aim, this paper provides a novel algorithmic framework integrating self-supervised pretraining, active learning, and consistency-regularized self-training. We conduct extensive experiments with our framework on CIFAR10 and CIFAR100 datasets. These experiments enable us to isolate and assess the benefits of individual components which are evaluated using state-of-the-art methods (e.g. Core-Set, VAAL, simCLR, FixMatch). Our experiments reveal two key insights: (i) Self-supervised pre-training significantly improves semi-supervised learning, especially in the few-label regime, (ii) The benefit of active learning is undermined and subsumed by S4L techniques. Specifically, we fail to observe any additional benefit of state-of-the-art active learning algorithms when combined with state-of-the-art S4L techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
08/25/2021

Reducing Label Effort: Self-Supervised meets Active Learning

Active learning is a paradigm aimed at reducing the annotation effort by...
research
01/25/2023

Toward Realistic Evaluation of Deep Active Learning Algorithms in Image Classification

Active Learning (AL) aims to reduce the labeling burden by interactively...
research
03/27/2023

Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need

Self-Supervised Learning (SSL) has emerged as the solution of choice to ...
research
11/10/2021

A Histopathology Study Comparing Contrastive Semi-Supervised and Fully Supervised Learning

Data labeling is often the most challenging task when developing computa...
research
01/25/2022

DebtFree: Minimizing Labeling Cost in Self-Admitted Technical Debt Identification using Semi-Supervised Learning

Keeping track of and managing Self-Admitted Technical Debts (SATDs) is i...
research
03/01/2020

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision

Active learning (AL) aims to minimize labeling efforts for data-demandin...
research
06/07/2023

NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating True Coverage

High annotation cost for training machine learning classifiers has drive...

Please sign up or login with your details

Forgot password? Click here to reset