Benign overfitting in ridge regression

09/29/2020 ∙ by A. Tsigler, et al. ∙ 0

Classical learning theory suggests that strong regularization is needed to learn a class with large complexity. This intuition is in contrast with the modern practice of machine learning, in particular learning neural networks, where the number of parameters often exceeds the number of data points. It has been observed empirically that such overparametrized models can show good generalization performance even if trained with vanishing or negative regularization. The aim of this work is to understand theoretically how this effect can occur, by studying the setting of ridge regression. We provide non-asymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and show that those bounds are tight for a range of regularization parameter values. To our knowledge this is the first work that studies overparametrized ridge regression in such a general setting. We identify when small or negative regularization is sufficient for obtaining small generalization error. On the technical side, our bounds only require the data vectors to be i.i.d. sub-gaussian, while most previous work assumes independence of the components of those vectors.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.