Superior generalization of smaller models in the presence of significant label noise

08/17/2022
by   Yihao Xue, et al.
0

The benefits of over-parameterization in achieving superior generalization performance have been shown in several recent studies, justifying the trend of using larger models in practice. In the context of robust learning however, the effect of neural network size has not been well studied. In this work, we find that in the presence of a substantial fraction of mislabeled examples, increasing the network size beyond some point can be harmful. In particular, the originally monotonic or `double descent' test loss curve (w.r.t. network width) turns into a U-shaped or a double U-shaped curve when label noise increases, suggesting that the best generalization is achieved by some model with intermediate size. We observe that when network size is controlled by density through random pruning, similar test loss behaviour is observed. We also take a closer look into both phenomenon through bias-variance decomposition and theoretically characterize how label noise shapes the variance term. Similar behavior of the test loss can be observed even when state-of-the-art robust methods are applied, indicating that limiting the network size could further boost existing methods. Finally, we empirically examine the effect of network size on the smoothness of learned functions, and find that the originally negative correlation between size and smoothness is flipped by label noise.

READ FULL TEXT

page 6

page 16

page 21

research
02/26/2020

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

The classical bias-variance trade-off predicts that bias decreases and v...
research
06/03/2022

Regularization-wise double descent: Why it occurs and how to eliminate it

The risk of overparameterized models, in particular deep neural networks...
research
10/25/2022

Pruning's Effect on Generalization Through the Lens of Training and Regularization

Practitioners frequently observe that pruning improves model generalizat...
research
10/19/2020

Do Deeper Convolutional Networks Perform Better?

Over-parameterization is a recent topic of much interest in the machine ...
research
08/03/2020

Multiple Descent: Design Your Own Generalization Curve

This paper explores the generalization loss of linear regression in vari...
research
12/11/2020

Beyond Occam's Razor in System Identification: Double-Descent when Modeling Dynamics

System identification aims to build models of dynamical systems from dat...
research
03/02/2023

Over-training with Mixup May Hurt Generalization

Mixup, which creates synthetic training instances by linearly interpolat...

Please sign up or login with your details

Forgot password? Click here to reset