An analytic theory of generalization dynamics and transfer learning in deep linear networks

09/27/2018
by   Andrew K. Lampinen, et al.
8

Much attention has been devoted recently to the generalization puzzle in deep learning: large, deep networks can generalize well, but existing theories bounding generalization error are exceedingly loose, and thus cannot explain this striking performance. Furthermore, a major hope is that knowledge may transfer across tasks, so that multi-task learning can improve generalization on individual tasks. However we lack analytic theories that can quantitatively predict how the degree of knowledge transfer depends on the relationship between the tasks. We develop an analytic theory of the nonlinear dynamics of generalization in deep linear networks, both within and across tasks. In particular, our theory provides analytic solutions to the training and testing error of deep networks as a function of training time, number of examples, network size and initialization, and the task structure and SNR. Our theory reveals that deep networks progressively learn the most important task structure first, so that generalization error at the early stopping time primarily depends on task structure and is independent of network size. This suggests any tight bound on generalization error must take into account task structure, and explains observations about real data being learned faster than random data. Intriguingly our theory also reveals the existence of a learning algorithm that proveably out-performs neural network training through gradient descent. Finally, for transfer learning, our theory reveals that knowledge transfer depends sensitively, but computably, on the SNRs and input feature alignments of pairs of tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2022

Overfreezing Meets Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks

We study the generalization behavior of transfer learning of deep neural...
research
06/12/2021

What can linearized neural networks actually say about generalization?

For certain infinitely-wide neural networks, the neural tangent kernel (...
research
06/12/2020

Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks

We study the transfer learning process between two linear regression pro...
research
10/04/2018

The Dynamics of Differential Learning I: Information-Dynamics and Task Reachability

We study the topology of the space of learning tasks, which is critical ...
research
11/28/2019

A Generalization Theory based on Independent and Task-Identically Distributed Assumption

Existing generalization theories analyze the generalization performance ...
research
08/23/2023

Critical Learning Periods Emerge Even in Deep Linear Networks

Critical learning periods are periods early in development where tempora...
research
04/10/2023

Simulated Annealing in Early Layers Leads to Better Generalization

Recently, a number of iterative learning methods have been introduced to...

Please sign up or login with your details

Forgot password? Click here to reset