A geometrical viewpoint on the benign overfitting property of the minimum l_2-norm interpolant estimator
Practitioners have observed that some deep learning models generalize well even with a perfect fit to noisy training data [5,45,44]. Since then many theoretical works have revealed some facets of this phenomenon [4,2,1,8] known as benign overfitting. In particular, in the linear regression model, the minimum l_2-norm interpolant estimator β̂ has received a lot of attention [1,39] since it was proved to be consistent even though it perfectly fits noisy data under some condition on the covariance matrix Σ of the input vector. Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from [39]. Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [2]: β̂ can be written as a sum of a ridge estimator β̂_1:k and an overfitting component β̂_k+1:p which follows a decomposition of the features space ℝ^p=V_1:k⊕^⊥ V_k+1:p into the space V_1:k spanned by the top k eigenvectors of Σ and the ones V_k+1:p spanned by the p-k last ones. We also prove a matching lower bound for the expected prediction risk. The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint coincides with the effective rank from [1,39] and is the key tool to handle the behavior of the design matrix restricted to the sub-space V_k+1:p where overfitting happens.
READ FULL TEXT