Tail bounds for volume sampled linear regression

02/19/2018
by   Michal Derezinski, et al.
0

The n × d design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to observe only a small number k ≪ n of the responses, and then produce a weight vector whose sum of square loss over all points is at most 1+ϵ times the minimum. A standard approach to this problem is to use i.i.d. leverage score sampling, but this approach is known to perform poorly when k is small (e.g., k = d); in such cases, it is dominated by volume sampling, a joint sampling method that explicitly promotes diversity. How these methods compare for larger k was not previously understood. We prove that volume sampling can have poor behavior for large k - indeed worse than leverage score sampling. We also show how to repair volume sampling using a new padding technique. We prove that padded volume sampling has at least as good a tail bound as leverage score sampling: sample size k=O(d d + d/ϵ) suffices to guarantee total loss at most 1+ϵ times the minimum with high probability. The main technical challenge is proving tail bounds for the sums of dependent random matrices that arise from volume sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2018

Reverse iterative volume sampling for linear regression

We study the following basic machine learning task: Given a fixed set of...
research
07/08/2019

Unbiased estimators for random design regression

In linear regression we wish to estimate the optimum linear least square...
research
02/04/2019

Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression

In experimental design, we are given a large collection of vectors, each...
research
10/04/2018

Correcting the bias in least squares regression with volume-rescaled sampling

Consider linear regression where the examples are generated by an unknow...
research
09/21/2020

Generalized Leverage Score Sampling for Neural Networks

Leverage score sampling is a powerful technique that originates from the...
research
06/01/2023

Sharper Bounds for ℓ_p Sensitivity Sampling

In large scale machine learning, random sampling is a popular way to app...
research
05/19/2021

L1 Regression with Lewis Weights Subsampling

We consider the problem of finding an approximate solution to ℓ_1 regres...

Please sign up or login with your details

Forgot password? Click here to reset