Robust subgaussian estimation of a mean vector in nearly linear time
We construct an algorithm, running in nearly-linear time, which is robust to outliers and heavy-tailed data and which achieves the subgaussian rate from [Lugosi, Mendelson] √( Tr(Σ)/N)+√(||Σ||_opK/N)with probability at least 1-(-c_0K) where Σ is the covariance matrix of the informative data. This rate is achieved when K≥ c_1 | O| where | O| is the number of outliers in the database and under the only assumption that the informative data have a second moment. The algorithm is fully data-dependent and does not use in its construction the proportion of outliers nor the rate above. Its construction combines recently developed tools for Median-of-Means estimators and covering-Semi-definite Programming [Chen, Diakonikolas, Ge] and [Peng, Tangwongsan, Zhang].
READ FULL TEXT