Implicit bias of gradient descent for mean squared error regression with wide neural networks

06/12/2020
by   Hui Jin, et al.
23

We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. Focusing on 1D regression, we show that the solution of training a width-n shallow ReLU network is within n^- 1/2 of the function which fits the training data and whose difference from initialization has smallest 2-norm of the second derivative weighted by 1/ζ. The curvature penalty function 1/ζ is expressed in terms of the probability distribution that is utilized to initialize the network parameters, and we compute it explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. The statement generalizes to the training trajectories, which in turn are captured by trajectories of spatially adaptive smoothing splines with decreasing regularization strength.

READ FULL TEXT
research
05/27/2019

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Natural gradient descent has proven effective at mitigating the effects ...
research
05/13/2021

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Neural networks trained via gradient descent with random initialization ...
research
02/09/2022

On the Implicit Bias of Gradient Descent for Temporal Extrapolation

Common practice when using recurrent neural networks (RNNs) is to apply ...
research
07/13/2020

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

We provide a detailed asymptotic study of gradient flow trajectories and...
research
03/05/2019

Implicit Regularization in Over-parameterized Neural Networks

Over-parameterized neural networks generalize well in practice without a...
research
11/07/2019

How implicit regularization of Neural Networks affects the learned function – Part I

Today, various forms of neural networks are trained to perform approxima...
research
08/04/2020

Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, Gradient Flow Dynamics

Understanding the learning dynamics and inductive bias of neural network...

Please sign up or login with your details

Forgot password? Click here to reset