Generalization in Deep Networks: The Role of Distance from Initialization

01/07/2019
by   Vaishnavh Nagarajan, et al.
0

Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network? To answer this question, we advocate a notion of effective model capacity that is dependent on a given random initialization of the network and not just the training algorithm and the data distribution. We provide empirical evidences that demonstrate that the model capacity of SGD-trained deep networks is in fact restricted through implicit regularization of the ℓ_2 distance from the initialization. We also provide theoretical arguments that further highlight the need for initialization-dependent notions of model capacity. We leave as open questions how and why distance from initialization is regularized, and whether it is sufficient to explain generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2019

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

We study the training and generalization of deep neural networks (DNNs) ...
research
06/17/2022

How You Start Matters for Generalization

Characterizing the remarkable generalization properties of over-paramete...
research
01/28/2023

On the Lipschitz Constant of Deep Networks and Double Descent

Existing bounds on the generalization error of deep networks assume some...
research
06/16/2017

A Closer Look at Memorization in Deep Networks

We examine the role of memorization in deep learning, drawing connection...
research
10/06/2020

Usable Information and Evolution of Optimal Representations During Training

We introduce a notion of usable information contained in the representat...
research
08/14/2021

Neuron Campaign for Initialization Guided by Information Bottleneck Theory

Initialization plays a critical role in the training of deep neural netw...
research
09/30/2022

Sparse tree-based initialization for neural networks

Dedicated neural network (NN) architectures have been designed to handle...

Please sign up or login with your details

Forgot password? Click here to reset