Statistically Optimal Robust Mean and Covariance Estimation for Anisotropic Gaussians

01/21/2023
by   Arshak Minasyan, et al.
0

Assume that X_1, …, X_N is an ε-contaminated sample of N independent Gaussian vectors in ℝ^d with mean μ and covariance Σ. In the strong ε-contamination model we assume that the adversary replaced an ε fraction of vectors in the original Gaussian sample by any other vectors. We show that there is an estimator μ of the mean satisfying, with probability at least 1 - δ, a bound of the form μ - μ_2 ≤ c(√(Tr(Σ)/N) + √(Σlog(1/δ)/N) + ε√(Σ)), where c > 0 is an absolute constant and Σ denotes the operator norm of Σ. In the same contaminated Gaussian setup, we construct an estimator Σ of the covariance matrix Σ that satisfies, with probability at least 1 - δ, Σ - Σ≤ c(√(ΣTr(Σ)/N) + Σ√(log(1/δ)/N) + εΣ). Both results are optimal up to multiplicative constant factors. Despite the recent significant interest in robust statistics, achieving both dimension-free bounds in the canonical Gaussian case remained open. In fact, several previously known results were either dimension-dependent and required Σ to be close to identity, or had a sub-optimal dependence on the contamination level ε. As a part of the analysis, we derive sharp concentration inequalities for central order statistics of Gaussian, folded normal, and chi-squared distributions.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset