Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

09/19/2022
by   Thomas George, et al.
0

Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called `lazy' regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. In other words, the non-linear dynamics tends to sequentialize the learning of examples of increasing difficulty. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of spurious correlations. Our results reveal a new understanding of how deep networks prioritize resources across example difficulty.

READ FULL TEXT

page 17

page 18

research
01/06/2022

The dynamics of representation learning in shallow, non-linear autoencoders

Autoencoders are the simplest neural network for unsupervised learning, ...
research
03/16/2022

Example Perplexity

Some examples are easier for humans to classify than others. The same sh...
research
04/13/2020

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

In recent years, a critical initialization scheme with orthogonal initia...
research
06/19/2019

Disentangling feature and lazy learning in deep neural networks: an empirical study

Two distinct limits for deep learning as the net width h→∞ have been pro...
research
02/18/2020

Learning Parities with Neural Networks

In recent years we see a rapidly growing line of research which shows le...
research
02/22/2020

Regression with Deep Learning for Sensor Performance Optimization

Neural networks with at least two hidden layers are called deep networks...
research
05/12/2021

Convergence Analysis of Over-parameterized Deep Linear Networks, and the Principal Components Bias

Convolutional Neural networks of different architectures seem to learn t...

Please sign up or login with your details

Forgot password? Click here to reset