The Curious Case of Benign Memorization

10/25/2022
by   Sotiris Anagnostidis, et al.
0

Despite the empirical advances of deep learning across a variety of learning tasks, our theoretical understanding of its success is still very restricted. One of the key challenges is the overparametrized nature of modern models, enabling complete overfitting of the data even if the labels are randomized, i.e. networks can completely memorize all given patterns. While such a memorization capacity seems worrisome, in this work we show that under training protocols that include data augmentation, neural networks learn to memorize entirely random labels in a benign way, i.e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing. We demonstrate that deep models have the surprising ability to separate noise from signal by distributing the task of memorization and feature learning to different layers. As a result, only the very last layers are used for memorization, while preceding layers encode performant features which remain largely unaffected by the label noise. We explore the intricate role of the augmentations used for training and identify a memorization-generalization trade-off in terms of their diversity, marking a clear distinction to all previous works. Finally, we give a first explanation for the emergence of benign memorization by showing that malign memorization under data augmentation is infeasible due to the insufficient capacity of the model for the increased sample size. As a consequence, the network is forced to leverage the correlated nature of the augmentations and as a result learns meaningful features. To complete the picture, a better theory of feature learning in deep neural networks is required to fully understand the origins of this phenomenon.

READ FULL TEXT
research
06/24/2022

Learning sparse features can lead to overfitting in neural networks

It is widely believed that the success of deep networks lies in their ab...
research
06/08/2021

What Data Augmentation Do We Need for Deep-Learning-Based Finance?

The main task we consider is portfolio construction in a speculative mar...
research
03/15/2023

The Benefits of Mixup for Feature Learning

Mixup, a simple data augmentation method that randomly mixes two data po...
research
08/03/2020

Making Coherence Out of Nothing At All: Measuring the Evolution of Gradient Alignment

We propose a new metric (m-coherence) to experimentally study the alignm...
research
12/18/2013

Unsupervised feature learning by augmenting single images

When deep learning is applied to visual object recognition, data augment...
research
06/03/2022

A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features

An important characteristic of neural networks is their ability to learn...
research
05/31/2023

Multi-Epoch Learning for Deep Click-Through Rate Prediction Models

The one-epoch overfitting phenomenon has been widely observed in industr...

Please sign up or login with your details

Forgot password? Click here to reset