Implicit Regularization in Over-parameterized Neural Networks

03/05/2019
by   Masayoshi Kubo, et al.
0

Over-parameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and prevents the network from overfitting. In this work, we introduce the gradient gap deviation and the gradient deflection as statistical measures corresponding to the network curvature and the Hessian matrix to analyze variations of network derivatives with respect to input parameters, and investigate how implicit regularization works in ReLU neural networks from both theoretical and empirical perspectives. Our result reveals that the network output between each pair of input samples is properly controlled by random initialization and stochastic gradient descent to keep interpolating between samples almost straight, which results in low complexity of over-parameterized neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2021

A global convergence theory for deep ReLU implicit networks via over-parameterization

Implicit deep learning has received increasing attention recently due to...
research
06/11/2020

Deep Learning Requires Explicit Regularization for Reliable Predictive Probability

From the statistical learning perspective, complexity control via explic...
research
10/06/2020

Usable Information and Evolution of Optimal Representations During Training

We introduce a notion of usable information contained in the representat...
research
11/30/2021

The Geometric Occam's Razor Implicit in Deep Learning

In over-parameterized deep neural networks there can be many possible pa...
research
06/17/2022

How You Start Matters for Generalization

Characterizing the remarkable generalization properties of over-paramete...
research
12/26/2017

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

We show that the (stochastic) gradient descent algorithm provides an imp...
research
06/12/2020

Implicit bias of gradient descent for mean squared error regression with wide neural networks

We investigate gradient descent training of wide neural networks and the...

Please sign up or login with your details

Forgot password? Click here to reset