Non-asymptotic analysis and inference for an outlyingness induced winsorized mean

05/05/2021
by   Yijun Zuo, et al.
0

Robust estimation of a mean vector, a topic regarded as obsolete in the traditional robust statistics community, has recently surged in machine learning literature in the last decade. The latest focus is on the sub-Gaussian performance and computability of the estimators in a non-asymptotic setting. Numerous traditional robust estimators are computationally intractable, which partly contributes to the renewal of the interest in the robust mean estimation. Robust centrality estimators, however, include the trimmed mean and the sample median. The latter has the best robustness but suffers a low-efficiency drawback. Trimmed mean and median of means, sample mean, and achieving sub-Gaussian performance have been proposed and studied in the literature. This article investigates the robustness of leading sub-Gaussian estimators of mean and reveals that none of them can resist greater than 25% contamination in data and consequently introduces an outlyingness induced winsorized mean which has the best possible robustness (can resist up to 50% contamination without breakdown) meanwhile achieving high efficiency. Furthermore, it has a sub-Gaussian performance for uncontaminated samples and a bounded estimation error for contaminated samples at a given confidence level in a finite sample setting. It can be computed in linear time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2020

Optimal Robust Linear Regression in Nearly Linear Time

We study the problem of high-dimensional robust linear regression where ...
research
02/01/2017

Sub-Gaussian estimators of the mean of a random vector

We study the problem of estimating the mean of a random vector X given a...
research
09/01/2020

Finite sample breakdown point of multivariate regression depth median

Depth induced multivariate medians (multi-dimensional maximum depth esti...
research
09/25/2022

Finite-sample Rousseeuw-Croux scale estimators

The Rousseeuw-Croux S_n, Q_n scale estimators and the median absolute de...
research
02/13/2018

MONK -- Outlier-Robust Mean Embedding Estimation by Median-of-Means

Mean embeddings provide an extremely flexible and powerful tool in machi...
research
12/28/2022

Robustifying Markowitz

Markowitz mean-variance portfolios with sample mean and covariance as in...
research
06/03/2022

Debiased Machine Learning without Sample-Splitting for Stable Estimators

Estimation and inference on causal parameters is typically reduced to a ...

Please sign up or login with your details

Forgot password? Click here to reset