A Closer Look at Memorization in Deep Networks

06/16/2017
by   Devansh Arpit, et al.
0

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

READ FULL TEXT
research
08/13/2018

Understanding training and generalization in deep learning by Fourier analysis

Background: It is still an open research area to theoretically understan...
research
01/07/2019

Generalization in Deep Networks: The Role of Distance from Initialization

Why does training deep neural networks using stochastic gradient descent...
research
07/06/2021

Generalization Error Analysis of Neural networks with Gradient Based Regularization

We study gradient-based regularization methods for neural networks. We m...
research
01/17/2020

DNNs as Layers of Cooperating Classifiers

A robust theoretical framework that can describe and predict the general...
research
07/30/2018

Faster Convergence & Generalization in DNNs

Deep neural networks have gained tremendous popularity in last few years...
research
05/26/2020

Inherent Noise in Gradient Based Methods

Previous work has examined the ability of larger capacity neural network...
research
03/15/2017

Sharp Minima Can Generalize For Deep Nets

Despite their overwhelming capacity to overfit, deep learning architectu...

Please sign up or login with your details

Forgot password? Click here to reset