Tail bounds for volume sampled linear regression

02/19/2018
by   Michal Derezinski, et al.
0

The n × d design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to observe only a small number k ≪ n of the responses, and then produce a weight vector whose sum of square loss over all points is at most 1+ϵ times the minimum. A standard approach to this problem is to use i.i.d. leverage score sampling, but this approach is known to perform poorly when k is small (e.g., k = d); in such cases, it is dominated by volume sampling, a joint sampling method that explicitly promotes diversity. How these methods compare for larger k was not previously understood. We prove that volume sampling can have poor behavior for large k - indeed worse than leverage score sampling. We also show how to repair volume sampling using a new padding technique. We prove that padded volume sampling has at least as good a tail bound as leverage score sampling: sample size k=O(d d + d/ϵ) suffices to guarantee total loss at most 1+ϵ times the minimum with high probability. The main technical challenge is proving tail bounds for the sums of dependent random matrices that arise from volume sampling.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset