Reverse iterative volume sampling for linear regression

06/06/2018
by   Michal Derezinski, et al.
0

We study the following basic machine learning task: Given a fixed set of d-dimensional input points for a linear regression problem, we wish to predict a hidden response value for each of the points. We can only afford to attain the responses for a small subset of the points that are then used to construct linear predictions for all points in the dataset. The performance of the predictions is evaluated by the total square loss on all responses (the attained as well as the hidden ones). We show that a good approximate solution to this least squares problem can be obtained from just dimension d many responses by using a joint sampling technique called volume sampling. Moreover, the least squares solution obtained for the volume sampled subproblem is an unbiased estimator of optimal solution based on all n responses. This unbiasedness is a desirable property that is not shared by other common subset selection techniques. Motivated by these basic properties, we develop a theoretical framework for studying volume sampling, resulting in a number of new matrix expectation equalities and statistical guarantees which are of importance not only to least squares regression but also to numerical linear algebra in general. Our methods also lead to a regularized variant of volume sampling, and we propose the first efficient algorithms for volume sampling which make this technique a practical tool in the machine learning toolbox. Finally, we provide experimental evidence which confirms our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2019

Unbiased estimators for random design regression

In linear regression we wish to estimate the optimum linear least square...
research
02/19/2018

Tail bounds for volume sampled linear regression

The n × d design matrix in a linear regression problem is given, but the...
research
02/04/2019

Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression

In experimental design, we are given a large collection of vectors, each...
research
10/04/2018

Correcting the bias in least squares regression with volume-rescaled sampling

Consider linear regression where the examples are generated by an unknow...
research
04/12/2021

Semi-Infinite Linear Regression and Its Applications

Finite linear least squares is one of the core problems of numerical lin...
research
06/06/2018

Conditional Linear Regression

Work in machine learning and statistics commonly focuses on building mod...
research
07/24/2018

A Structured Perspective of Volumes on Active Learning

Active Learning (AL) is a learning task that requires learners interacti...

Please sign up or login with your details

Forgot password? Click here to reset