The Implicit Bias of Depth: How Incremental Learning Drives Generalization

09/26/2019
by   Daniel Gissin, et al.
0

A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity. We formally define the notion of incremental learning dynamics and derive the conditions on depth and initialization for which this phenomenon arises in deep linear models. Our main theoretical contribution is a dynamical depth separation result, proving that while shallow models can exhibit incremental learning dynamics, they require the initialization to be exponentially small for these dynamics to present themselves. However, once the model becomes deeper, the dependence becomes polynomial and incremental learning can arise in more natural settings. We complement our theoretical findings by experimenting with deep matrix sensing, quadratic neural networks and with binary classification using diagonal and convolutional linear networks, showing all of these models exhibit incremental learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2023

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing

It is believed that Gradient Descent (GD) induces an implicit bias towar...
research
08/31/2022

Incremental Learning in Diagonal Linear Networks

Diagonal linear networks (DLNs) are a toy simplification of artificial n...
research
05/30/2018

The Dynamics of Learning: A Random Matrix Approach

Understanding the learning dynamics of neural networks is one of the key...
research
09/23/2018

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

In this note, we study the dynamics of gradient descent on objective fun...
research
06/12/2023

Transformers learn through gradual rank increase

We identify incremental learning dynamics in transformers, where the dif...
research
05/24/2022

Quadratic models for understanding neural network dynamics

In this work, we propose using a quadratic model as a tool for understan...
research
05/12/2021

Convergence Analysis of Over-parameterized Deep Linear Networks, and the Principal Components Bias

Convolutional Neural networks of different architectures seem to learn t...

Please sign up or login with your details

Forgot password? Click here to reset