Reliable Covariance Estimation

06/05/2020
by   Ilya Soloveychik, et al.
0

Covariance or scatter matrix estimation is ubiquitous in most modern statistical and machine learning applications. The task becomes especially challenging since most real-world datasets are essentially non-Gaussian. The data is often contaminated by outliers and/or has heavy-tailed distribution causing the sample covariance to behave very poorly and calling for robust estimation methodology. The natural framework for the robust scatter matrix estimation is based on elliptical populations. Here, Tyler's estimator stands out by being distribution-free within the elliptical family and easy to compute. The existing works thoroughly study the performance of Tyler's estimator assuming ellipticity but without providing any tools to verify this assumption when the covariance is unknown in advance. We address the following open question: Given the sampled data and having no prior on the data generating process, how to assess the quality of the scatter matrix estimator? In this work we show that this question can be reformulated as an asymptotic uniformity test for certain sequences of exchangeable vectors on the unit sphere. We develop a consistent and easily applicable goodness-of-fit test against all alternatives to ellipticity when the scatter matrix is unknown. The findings are supported by numerical simulations demonstrating the power of the suggest technique.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2016

Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries

Estimation of the covariance matrix has attracted a lot of attention of ...
research
11/05/2018

User-Friendly Covariance Estimation for Heavy-Tailed Distributions

We propose user-friendly covariance matrix estimators that are robust ag...
research
10/10/2020

Effective Data-aware Covariance Estimator from Compressed Data

Estimating covariance matrix from massive high-dimensional and distribut...
research
04/11/2020

Covariance Estimation for Matrix-valued Data

Covariance estimation for matrix-valued data has received an increasing ...
research
10/26/2022

R-NL: Fast and Robust Covariance Estimation for Elliptical Distributions in High Dimensions

We combine Tyler's robust estimator of the dispersion matrix with nonlin...
research
12/19/2022

Direct covariance matrix estimation with compositional data

Compositional data arise in many areas of research in the natural and bi...
research
09/06/2022

A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance & heterogeneous noise

We revisit heavy-tailed corrupted least-squares linear regression assumi...

Please sign up or login with your details

Forgot password? Click here to reset