Query Complexity of Least Absolute Deviation Regression via Robust Uniform Convergence

by   Xue Chen, et al.

Consider a regression problem where the learner is given a large collection of d-dimensional data points, but can only query a small subset of the real-valued labels. How many queries are needed to obtain a 1+ϵ relative error approximation of the optimum? While this problem has been extensively studied for least squares regression, little is known for other losses. An important example is least absolute deviation regression (ℓ_1 regression) which enjoys superior robustness to outliers compared to least squares. We develop a new framework for analyzing importance sampling methods in regression problems, which enables us to show that the query complexity of least absolute deviation regression is Θ(d/ϵ^2) up to logarithmic factors. We further extend our techniques to show the first bounds on the query complexity for any ℓ_p loss with p∈(1,2). As a key novelty in our analysis, we introduce the notion of robust uniform convergence, which is a new approximation guarantee for the empirical loss. While it is inspired by uniform convergence in statistical learning, our approach additionally incorporates a correction term to avoid unnecessary variance due to outliers. This can be viewed as a new connection between statistical learning theory and variance reduction techniques in stochastic optimization, which should be of independent interest.



There are no comments yet.


page 1

page 2

page 3

page 4


Robust and Sparse Regression in GLM by Stochastic Optimization

The generalized linear model (GLM) plays a key role in regression analys...

Online Variance Reduction for Stochastic Optimization

Modern stochastic optimization methods often rely on uniform sampling wh...

Robust regression with covariate filtering: Heavy tails and adversarial contamination

We study the problem of linear regression where both covariates and resp...

M-estimation in high-dimensional linear model

We mainly study the M-estimation method for the high-dimensional linear ...

Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions

We develop and analyze stochastic optimization algorithms for problems i...

Outlier Detection Using Distributionally Robust Optimization under the Wasserstein Metric

We present a Distributionally Robust Optimization (DRO) approach to outl...

Tilted Empirical Risk Minimization

Empirical risk minimization (ERM) is typically designed to perform well ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.