Robust Mean Estimation under Coordinate-level Corruption

02/10/2020
by   Zifan Liu, et al.
4

Data corruption, systematic or adversarial, may skew statistical estimation severely. Recent work provides computationally efficient estimators that nearly match the information-theoretic optimal statistic. Yet the corruption model they consider measures sample-level corruption and is not fine-grained enough for many real-world applications. In this paper, we propose a coordinate-level metric of distribution shift over high-dimensional settings with n coordinates. We introduce and analyze robust mean estimation techniques against an adversary who may hide individual coordinates of samples while being bounded by that metric. We show that for structured distribution settings, methods that leverage structure to fill in missing entries before mean estimation can improve the estimation accuracy by a factor of approximately n compared to structure-agnostic methods. We also leverage recent progress in matrix completion to obtain estimators for recovering the true mean of the samples in settings of unknown structure. We demonstrate with real-world data that our methods can capture the dependencies across attributes and provide accurate mean estimation even in high-magnitude corruption settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2020

Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers

We study the problem of robustly estimating the mean of a d-dimensional ...
research
11/14/2019

Recent Advances in Algorithmic High-Dimensional Robust Statistics

Learning in the presence of outliers is a fundamental problem in statist...
research
07/16/2020

Optimal Robust Linear Regression in Nearly Linear Time

We study the problem of high-dimensional robust linear regression where ...
research
10/13/2022

Variance-Aware Estimation of Kernel Mean Embedding

An important feature of kernel mean embeddings (KME) is that the rate of...
research
03/15/2017

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

We introduce a criterion, resilience, which allows properties of a datas...
research
01/02/2022

Matrix Completion with Hierarchical Graph Side Information

We consider a matrix completion problem that exploits social or item sim...
research
05/27/2019

Scalable K-Medoids via True Error Bound and Familywise Bandits

K-Medoids(KM) is a standard clustering method, used extensively on semi-...

Please sign up or login with your details

Forgot password? Click here to reset