On the robustness to adversarial corruption and to heavy-tailed data of the Stahel-Donoho median of means

by   Jules Depersin, et al.

We consider median of means (MOM) versions of the Stahel-Donoho outlyingness (SDO) [stahel 1981, donoho 1982] and of Median Absolute Deviation (MAD) functions to construct subgaussian estimators of a mean vector under adversarial contamination and heavy-tailed data. We develop a single analysis of the MOM version of the SDO which covers all cases ranging from the Gaussian case to the L2 case. It is based on isomorphic and almost isometric properties of the MOM versions of SDO and MAD. This analysis also covers cases where the mean does not even exist but a location parameter does; in those cases we still recover the same subgaussian rates and the same price for adversarial contamination even though there is not even a first moment. These properties are achieved by the classical SDO median and are therefore the first non-asymptotic statistical bounds on the Stahel-Donoho median complementing the √(n)-consistency [maronna 1995] and asymptotic normality [Zuo, Cui, He, 2004] of the Stahel-Donoho estimators. We also show that the MOM version of MAD can be used to construct an estimator of the covariance matrix under only a L2-moment assumption or of a scale parameter if a second moment does not exist.



There are no comments yet.


page 1

page 2

page 3

page 4


Robust subgaussian estimation of a mean vector in nearly linear time

We construct an algorithm, running in nearly-linear time, which is robus...

Robust subgaussian estimation with VC-dimension

Median-of-means (MOM) based procedures provide non-asymptotic and strong...

Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms

We consider the problem of robust mean and location estimation w.r.t. an...

How Robust is the Median-of-Means? Concentration Bounds in Presence of Outliers

In contrast to the empirical mean, the Median-of-Means (MoM) is an estim...

U-statistics of growing order and sub-Gaussian mean estimators with sharp constants

This paper addresses the following question: given a sample of i.i.d. ra...

ICA based on Split Generalized Gaussian

Independent Component Analysis (ICA) - one of the basic tools in data an...

Estimating the concentration parameter of a von Mises distribution: a systematic simulation benchmark

In directional statistics, the von Mises distribution is a key element i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.