Improved covariance estimation: optimal robustness and sub-Gaussian guarantees under heavy tails
We present an estimator of the covariance matrix Σ of random d-dimensional vector from an i.i.d. sample of size n. Our sole assumption is that this vector satisfies a bounded L^p-L^2 moment assumption over its one-dimensional marginals, for some p≥ 4. Given this, we show that Σ can be estimated from the sample with the same high-probability error rates that the sample covariance matrix achieves in the case of Gaussian data. This holds even though we allow for very general distributions that may not have moments of order >p. Moreover, our estimator can be made to be optimally robust to adversarial contamination. This result improves recent results in the literature by Mendelson and Zhivotovskiy and Catoni and Giulini, and matches parallel work by Abdalla and Zhivotovskiy (the exact relationship with this last work is described in the paper).
READ FULL TEXT