Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets

10/24/2020
by   Depen Morwani, et al.
0

We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. Our analysis focuses on exponential weight normalization (EWN), which encourages weight updates along the radial direction. This paper shows that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate, and hence causes the weights to be updated in a way that prefers asymptotic relative sparsity. These results can be extended to hold for gradient descent via an appropriate adaptive learning rate. The asymptotic convergence rate of the loss in this setting is given by Θ(1/t(log t)^2), and is independent of the depth of the network. We contrast these results with the inductive bias of standard weight normalization (SWN) and unnormalized architectures, and demonstrate their implications on synthetic data sets.Experimental results on simple data sets and architectures support our claim on sparse EWN solutions, even with SGD. This demonstrates its potential applications in learning prunable neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2019

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Recent works on implicit regularization have shown that gradient descent...
research
06/05/2018

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent (SGD) is a central tool in machine learning....
research
05/09/2023

Robust Implicit Regularization via Weight Normalization

Overparameterized models may have many interpolating solutions; implicit...
research
02/05/2022

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Understanding the asymptotic behavior of gradient-descent training of de...
research
05/19/2022

Understanding Gradient Descent on Edge of Stability in Deep Learning

Deep learning experiments in Cohen et al. (2021) using deterministic Gra...
research
06/25/2020

Learning compositional functions via multiplicative weight updates

Compositionality is a basic structural feature of both biological and ar...
research
10/07/2022

The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

In this work, we explore the maximum-margin bias of quasi-homogeneous ne...

Please sign up or login with your details

Forgot password? Click here to reset