Active Tolerant Testing

11/01/2017
by   Avrim Blum, et al.
0

In this work, we give the first algorithms for tolerant testing of nontrivial classes in the active model: estimating the distance of a target function to a hypothesis class C with respect to some arbitrary distribution D, using only a small number of label queries to a polynomial-sized pool of unlabeled examples drawn from D. Specifically, we show that for the class D of unions of d intervals on the line, we can estimate the error rate of the best hypothesis in the class to an additive error epsilon from only O(1/ϵ^61/ϵ) label queries to an unlabeled pool of size O(d/ϵ^21/ϵ). The key point here is the number of labels needed is independent of the VC-dimension of the class. This extends the work of Balcan et al. [2012] who solved the non-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than epsilon). We also consider the related problem of estimating the performance of a given learning algorithm A in this setting. That is, given a large pool of unlabeled examples drawn from distribution D, can we, from only a few label queries, estimate how well A would perform if the entire dataset were labeled? We focus on k-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of k for the given learning problem).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2020

Active Local Learning

In this work we consider active local learning: given a query point x, a...
research
04/17/2015

Testing Closeness With Unequal Sized Samples

We consider the problem of closeness testing for two discrete distributi...
research
06/24/2019

Distribution-Independent PAC Learning of Halfspaces with Massart Noise

We study the problem of distribution-independent PAC learning of halfsp...
research
05/04/2018

Estimating Learnability in the Sublinear Data Regime

We consider the problem of estimating how well a model class is capable ...
research
06/05/2021

Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach

Given restrictions on the availability of data, active learning is the p...
research
07/10/2020

Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples

We present a transductive learning algorithm that takes as input trainin...
research
10/27/2021

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, a...

Please sign up or login with your details

Forgot password? Click here to reset