Asymptotic Characterisation of Robust Empirical Risk Minimisation Performance in the Presence of Outliers

05/30/2023
by   Matteo Vilucchio, et al.
0

We study robust linear regression in high-dimension, when both the dimension d and the number of data points n diverge with a fixed ratio α=n/d, and study a data model that includes outliers. We provide exact asymptotics for the performances of the empirical risk minimisation (ERM) using ℓ_2-regularised ℓ_2, ℓ_1, and Huber loss, which are the standard approach to such problems. We focus on two metrics for the performance: the generalisation error to similar datasets with outliers, and the estimation error of the original, unpolluted function. Our results are compared with the information theoretic Bayes-optimal estimation bound. For the generalization error, we find that optimally-regularised ERM is asymptotically consistent in the large sample complexity limit if one perform a simple calibration, and compute the rates of convergence. For the estimation error however, we show that due to a norm calibration mismatch, the consistency of the estimator requires an oracle estimate of the optimal norm, or the presence of a cross-validation set not corrupted by the outliers. We examine in detail how performance depends on the loss function and on the degree of outlier corruption in the training set and identify a region of parameters where the optimal performance of the Huber loss is identical to that of the ℓ_2 loss, offering insights into the use cases of different loss functions.

READ FULL TEXT
research
02/16/2019

Making Convex Loss Functions Robust to Outliers using e-Exponentiated Transformation

In this paper, we propose a novel e-exponentiated transformation, 0.5< e...
research
05/09/2015

Estimation with Norm Regularization

Analysis of non-asymptotic estimation error and structured statistical r...
research
01/15/2022

Hyperplane bounds for neural feature mappings

Deep learning methods minimise the empirical risk using loss functions s...
research
09/23/2016

Changepoint Detection in the Presence of Outliers

Many traditional methods for identifying changepoints can struggle in th...
research
04/05/2019

Robust Subspace Recovery with Adversarial Outliers

We study the problem of robust subspace recovery (RSR) in the presence o...
research
02/11/2018

On the Rates of Convergence from Surrogate Risk Minimizers to the Bayes Optimal Classifier

We study the rates of convergence from empirical surrogate risk minimize...
research
02/26/2020

Aggregated hold out for sparse linear regression with a robust loss function

Sparse linear regression methods generally have a free hyperparameter wh...

Please sign up or login with your details

Forgot password? Click here to reset