Out-of-sample error estimate for robust M-estimators with convex penalty
A generic out-of-sample error estimate is proposed for robust M-estimators regularized with a convex penalty in high-dimensional linear regression where (X,y) is observed and p,n are of the same order. If ψ is the derivative of the robust data-fitting loss ρ, the estimate depends on the observed data only through the quantities ψ̂= ψ(y-Xβ̂), X^⊤ψ̂ and the derivatives (∂/∂ y) ψ̂ and (∂/∂ y) Xβ̂ for fixed X. The out-of-sample error estimate enjoys a relative error of order n^-1/2 in a linear model with Gaussian covariates and independent noise, either non-asymptotically when p/n≤γ or asymptotically in the high-dimensional asymptotic regime p/n→γ'∈(0,∞). General differentiable loss functions ρ are allowed provided that ψ=ρ' is 1-Lipschitz. The validity of the out-of-sample error estimate holds either under a strong convexity assumption, or for the ℓ_1-penalized Huber M-estimator if the number of corrupted observations and sparsity of the true β are bounded from above by s_*n for some small enough constant s_*∈(0,1) independent of n,p. For the square loss and in the absence of corruption in the response, the results additionally yield n^-1/2-consistent estimates of the noise variance and of the generalization error. This generalizes, to arbitrary convex penalty, estimates that were previously known for the Lasso.
READ FULL TEXT