Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression

02/04/2019
by   Michal Derezinski, et al.
16

In experimental design, we are given a large collection of vectors, each with a hidden response value that we assume derives from an underlying linear model, and we wish to pick a small subset of the vectors such that querying the corresponding responses will lead to a good estimator of the model. A classical approach in statistics is to assume the responses are linear, plus zero-mean i.i.d. Gaussian noise, in which case the goal is to provide an unbiased estimator with smallest mean squared error (A-optimal design). A related approach, more common in computer science, is to assume the responses are arbitrary but fixed, in which case the goal is to estimate the least squares solution using few responses, as quickly as possible, for worst-case inputs. Despite many attempts, characterizing the relationship between these two approaches has proven elusive. We address this by proposing a framework for experimental design where the responses are produced by an arbitrary unknown distribution. We show that there is an efficient randomized experimental design procedure that achieves strong variance bounds for an unbiased estimator using few responses in this general model. Nearly tight bounds for the classical A-optimality criterion, as well as improved bounds for worst-case responses, emerge as special cases of this result. In the process, we develop a new algorithm for a joint sampling distribution called volume sampling, and we propose a new i.i.d. importance sampling method: inverse score sampling. A key novelty of our analysis is in developing new expected error bounds for worst-case regression by controlling the tail behavior of i.i.d. sampling via the jointness of volume sampling. Our result motivates a new minimax-optimality criterion for experimental design which can be viewed as an extension of both A-optimal design and sampling for worst-case regression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2018

Reverse iterative volume sampling for linear regression

We study the following basic machine learning task: Given a fixed set of...
research
12/22/2022

Randomizing the trapezoidal rule gives the optimal RMSE rate in Gaussian Sobolev spaces

Randomized quadratures for integrating functions in Sobolev spaces of or...
research
02/23/2018

Approximate Positively Correlated Distributions and Approximation Algorithms for D-optimal Design

Experimental design is a classical problem in statistics and has also fo...
research
10/23/2020

LowCon: A design-based subsampling approach in a misspecified linear modeL

We consider a measurement constrained supervised learning problem, that ...
research
02/19/2018

Tail bounds for volume sampled linear regression

The n × d design matrix in a linear regression problem is given, but the...
research
02/02/2019

Learning Linear Dynamical Systems with Semi-Parametric Least Squares

We analyze a simple prefiltered variation of the least squares estimator...
research
05/25/2015

Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares -- ICML

We consider statistical and algorithmic aspects of solving large-scale l...

Please sign up or login with your details

Forgot password? Click here to reset