How Robust is the Median-of-Means? Concentration Bounds in Presence of Outliers

06/09/2020
by   Pierre Laforgue, et al.
0

In contrast to the empirical mean, the Median-of-Means (MoM) is an estimator of the mean θ of a square integrable r.v. Z, around which accurate nonasymptotic confidence bounds can be built, even when Z does not exhibit a sub-Gaussian tail behavior. Because of the high confidence it achieves when applied to heavy-tailed data, MoM has recently found applications in statistical learning, in order to design training procedures that are not sensitive to atypical nor corrupted observations. For the first time, we provide concentration bounds for the MoM estimator in presence of outliers, that depend explicitly on the fraction of contaminated data present in the sample. These results are also extended to "Medians-of-U-statistics” (i.e. averages over tuples of observations), and are shown to furnish generalization guarantees for pairwise learning techniques (e.g. ranking, metric learning) based on contaminated training data. Beyond the theoretical analysis carried out, numerical results are displayed, that provide strong empirical evidence of the robustness properties claimed by the learning rate bounds established.

READ FULL TEXT
research
11/01/2022

On Medians of (Randomized) Pairwise Means

Tournament procedures, recently introduced in Lugosi Mendelson (2016...
research
07/06/2023

The Geometric Median and Applications to Robust Mean Estimation

This paper is devoted to the statistical and numerical properties of the...
research
12/06/2018

Median of means principle as a divide-and-conquer procedure for robustness, sub-sampling and hyper-parameters tuning

Many learning methods have poor risk estimates with large probability un...
research
01/22/2021

On the robustness to adversarial corruption and to heavy-tailed data of the Stahel-Donoho median of means

We consider median of means (MOM) versions of the Stahel-Donoho outlying...
research
06/12/2020

Concentration Bounds for the Collision Estimator

We prove a strong concentration result about the natural collision estim...
research
05/30/2023

Efficient median of means estimator

The goal of this note is to present a modification of the popular median...
research
10/15/2018

Robust descent using smoothed multiplicative noise

To improve the off-sample generalization of classical procedures minimiz...

Please sign up or login with your details

Forgot password? Click here to reset