Over-training with Mixup May Hurt Generalization

03/02/2023
by   Zixuan Liu, et al.
0

Mixup, which creates synthetic training instances by linearly interpolating random sample pairs, is a simple and yet effective regularization technique to boost the performance of deep models trained with SGD. In this work, we report a previously unobserved phenomenon in Mixup training: on a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs, giving rise to a U-shaped generalization curve. This behavior is further aggravated when the size of original dataset is reduced. To help understand such a behavior of Mixup, we show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data. Via analyzing a least-square regression problem with a random feature model, we explain why noisy labels may cause the U-shaped curve to occur: Mixup improves generalization through fitting the clean patterns at the early training stage, but as training progresses, Mixup becomes over-fitting to the noise in the synthetic data. Extensive experiments are performed on a variety of benchmark datasets, validating this explanation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

On Emergence of Clean-Priority Learning in Early Stopped Neural Networks

When random label noise is added to a training dataset, the prediction e...
research
06/21/2021

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise

Learning with noisy labels is a practically challenging problem in weakl...
research
07/29/2022

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

Deep models trained with noisy labels are prone to over-fitting and stru...
research
12/21/2020

Regularization in neural network optimization via trimmed stochastic gradient descent with noisy label

Regularization is essential for avoiding over-fitting to training data i...
research
08/17/2022

Superior generalization of smaller models in the presence of significant label noise

The benefits of over-parameterization in achieving superior generalizati...
research
12/11/2020

Beyond Occam's Razor in System Identification: Double-Descent when Modeling Dynamics

System identification aims to build models of dynamical systems from dat...
research
06/23/2023

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

This paper focuses on predicting the occurrence of grokking in neural ne...

Please sign up or login with your details

Forgot password? Click here to reset