Batch Active Learning at Scale

07/29/2021
by   Gui Citovsky, et al.
2

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch – a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to recent baselines. Finally, we provide an initial theoretical analysis, proving label complexity guarantees for a related sampling method, which we show is approximately equivalent to our sampling method in specific settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2019

Learning in Confusion: Batch Active Learning with Noisy Oracle

We study the problem of training machine learning models incrementally u...
research
05/31/2019

Minimum-Margin Active Learning

We present a new active sampling method we call min-margin which trains ...
research
01/28/2023

Leveraging Importance Weights in Subset Selection

We present a subset selection algorithm designed to work with arbitrary ...
research
06/09/2019

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds

We design a new algorithm for batch active learning with deep neural net...
research
01/24/2018

Impact of Batch Size on Stopping Active Learning for Text Classification

When using active learning, smaller batch sizes are typically more effic...
research
05/23/2020

Active Learning for Skewed Data Sets

Consider a sequential active learning problem where, at each round, an a...
research
11/20/2019

Deep Active Learning: Unified and Principled Method for Query and Training

In this paper, we proposed a unified and principled method for both quer...

Please sign up or login with your details

Forgot password? Click here to reset