Generalisation dynamics of online learning in over-parameterised neural networks

01/25/2019
by   Sebastian Goldt, et al.
12

Deep neural networks achieve stellar generalisation on a variety of problems, despite often being large enough to easily fit all their training data. Here we study the generalisation dynamics of two-layer neural networks in a teacher-student setup, where one network, the student, is trained using stochastic gradient descent (SGD) on data generated by another network, called the teacher. We show how for this problem, the dynamics of SGD are captured by a set of differential equations. In particular, we demonstrate analytically that the generalisation error of the student increases linearly with the network size, with other relevant parameters held constant. Our results indicate that achieving good generalisation in neural networks depends on the interplay of at least the algorithm, its learning rate, the model architecture, and the data set.

READ FULL TEXT
research
06/18/2019

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

Deep neural networks achieve stellar generalisation even when they have ...
research
03/24/2023

Online Learning for the Random Feature Model in the Student-Teacher Framework

Deep neural networks are widely used prediction algorithms whose perform...
research
10/11/2021

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Understanding the training dynamics of deep learning models is perhaps a...
research
11/12/2015

Representational Distance Learning for Deep Neural Networks

Deep neural networks (DNNs) provide useful models of visual representati...
research
03/23/2020

Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena

In the context of neural network models, overparametrization refers to t...
research
09/30/2019

Over-parameterization as a Catalyst for Better Generalization of Deep ReLU network

To analyze deep ReLU network, we adopt a student-teacher setting in whic...
research
12/09/2019

Stealing Knowledge from Protected Deep Neural Networks Using Composite Unlabeled Data

As state-of-the-art deep neural networks are deployed at the core of mor...

Please sign up or login with your details

Forgot password? Click here to reset