Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff

02/05/2020
by   Vahid Jamali, et al.
0

Semi-supervised classification, one of the most prominent fields in machine learning, studies how to combine the statistical knowledge of the often abundant unlabeled data with the often limited labeled data in order to maximize overall classification accuracy. In this context, the process of actively choosing the data to be labeled is referred to as active learning. In this paper, we initiate the non-asymptotic analysis of the optimal policy for semi-supervised classification with actively obtained labeled data. Considering a general Bayesian classification model, we provide the first characterization of the jointly optimal active learning and semi-supervised classification policy, in terms of the cost-performance tradeoff driven by the label query budget (number of data items to be labeled) and overall classification accuracy. Leveraging recent results on the Rényi Entropy, we derive tight information-theoretic bounds on such active learning cost-performance tradeoff.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2018

Deep Bayesian Active Semi-Supervised Learning

In many applications the process of generating label information is expe...
research
12/02/2019

Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

We propose using active learning based techniques to further improve the...
research
01/09/2019

Guess What's on my Screen? Clustering Smartphone Screenshots with Active Learning

A significant proportion of individuals' daily activities is experienced...
research
05/14/2020

VirAAL: Virtual Adversarial Active Learning

This paper presents VirAAL, an Active Learning framework based on Advers...
research
02/06/2022

Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets

Investigating active learning, we focus on the relation between the numb...
research
06/27/2012

Bayesian Optimal Active Search and Surveying

We consider two active binary-classification problems with atypical obje...
research
01/28/2020

QActor: On-line Active Learning for Noisy Labeled Stream Data

Noisy labeled data is more a norm than a rarity for self-generated conte...

Please sign up or login with your details

Forgot password? Click here to reset