Semi-supervised Active Regression

06/12/2021
by   Fnu Devvrit, et al.
0

Labelled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labelled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of semi-supervised active learning through the frame of linear regression. In this setting, the learner has access to a dataset X ∈ℝ^(n_1+n_2) × d which is composed of n_1 unlabelled examples that an algorithm can actively query, and n_2 examples labelled a-priori. Concretely, denoting the true labels by Y ∈ℝ^n_1 + n_2, the learner's objective is to find β∈ℝ^d such that, X β - Y _2^2 ≤ (1 + ϵ) min_β∈ℝ^d X β - Y _2^2 while making as few additional label queries as possible. In order to bound the label queries, we introduce an instance dependent parameter called the reduced rank, denoted by R_X, and propose an efficient algorithm with query complexity O(R_X/ϵ). This result directly implies improved upper bounds for two important special cases: (i) active ridge regression, and (ii) active kernel ridge regression, where the reduced-rank equates to the statistical dimension, sd_λ and effective dimension, d_λ of the problem respectively, where λ≥ 0 denotes the regularization parameter. For active ridge regression we also prove a matching lower bound of O(sd_λ / ϵ) on the query complexity of any algorithm. This subsumes prior work that only considered the unregularized case, i.e., λ = 0.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

Active Covering

We analyze the problem of active covering, where the learner is given an...
research
05/15/2019

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel k-means Clustering

We present tight lower bounds on the number of kernel evaluations requir...
research
06/05/2016

Active Regression with Adaptive Huber Loss

This paper addresses the scalar regression problem through a novel solut...
research
01/24/2022

Active Learning Polynomial Threshold Functions

We initiate the study of active learning polynomial threshold functions ...
research
06/08/2015

Convergence Rates of Active Learning for Maximum Likelihood Estimation

An active learner is given a class of models, a large set of unlabeled e...
research
12/03/2020

Online Forgetting Process for Linear Regression Models

Motivated by the EU's "Right To Be Forgotten" regulation, we initiate a ...
research
10/23/2014

Attribute Efficient Linear Regression with Data-Dependent Sampling

In this paper we analyze a budgeted learning setting, in which the learn...

Please sign up or login with your details

Forgot password? Click here to reset