Implicit Regularization in ReLU Networks with the Square Loss

12/09/2020
by   Gal Vardi, et al.
0

Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for regression losses such as the square loss. Perhaps surprisingly, we prove that even for a single ReLU neuron, it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters (although on the positive side, we show it can be characterized approximately). For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the "balancedness" property identified in Du et al. [2018]. Our results suggest that a more general framework than the one considered so far may be needed to understand implicit regularization for nonlinear predictors, and provides some clues on what this framework should be.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/11/2022

Support Vectors and Gradient Dynamics for Implicit Bias in ReLU Networks

Understanding implicit bias of gradient descent has been an important go...
01/30/2022

Implicit Regularization Towards Rank Minimization in ReLU Networks

We study the conjectured relationship between the implicit regularizatio...
10/11/2021

A global convergence theory for deep ReLU implicit networks via over-parameterization

Implicit deep learning has received increasing attention recently due to...
01/28/2022

Limitation of characterizing implicit regularization by data-independent functions

In recent years, understanding the implicit regularization of neural net...
08/04/2020

Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, Gradient Flow Dynamics

Understanding the learning dynamics and inductive bias of neural network...
02/19/2021

Implicit Regularization in Tensor Factorization

Implicit regularization in deep learning is perceived as a tendency of g...
11/30/2021

The Geometric Occam's Razor Implicit in Deep Learning

In over-parameterized deep neural networks there can be many possible pa...