Does Learning Require Memorization? A Short Tale about a Long Tail

06/12/2019
by   Vitaly Feldman, et al.
7

State-of-the-art results on image recognition tasks are achieved using over-parameterized learning algorithms that (nearly) perfectly fit the training set. This phenomenon is referred to as data interpolation or, informally, as memorization of the training data. The question of why such algorithms generalize well to unseen data is not adequately addressed by the standard theoretical frameworks and, as a result, significant theoretical and experimental effort has been devoted to understanding the properties of such algorithms. We provide a simple and generic model for prediction problems in which interpolating the dataset is necessary for achieving close-to-optimal generalization error. The model is motivated and supported by the results of several recent empirical works. In our model, data is sampled from a mixture of subpopulations and the frequencies of these subpopulations are chosen from some prior. The model allows to quantify the effect of not fitting the training data on the generalization performance of the learned classifier and demonstrates that memorization is necessary whenever frequencies are long-tailed. Image and text data are known to follow such distributions and therefore our results establish a formal link between these empirical phenomena. To the best of our knowledge, this is the first general framework that demonstrates statistical benefits of plain memorization for learning. Our results also have concrete implications for the cost of ensuring differential privacy in learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2020

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

Deep learning algorithms are well-known to have a propensity for fitting...
research
11/02/2019

Adaptive Statistical Learning with Bayesian Differential Privacy

In statistical learning, a dataset is often partitioned into two parts: ...
research
07/20/2023

Long-Tail Theory under Gaussian Mixtures

We suggest a simple Gaussian mixture model for data generation that comp...
research
03/11/2021

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models

Recent studies indicate that NLU models are prone to rely on shortcut fe...
research
06/15/2022

Reconstructing Training Data from Trained Neural Networks

Understanding to what extent neural networks memorize training data is a...
research
08/21/2020

Privacy Preserving Recalibration under Domain Shift

Classifiers deployed in high-stakes real-world applications must output ...

Please sign up or login with your details

Forgot password? Click here to reset