On the Learning Dynamics of Deep Neural Networks

09/18/2018
by   Remi Tachet des Combes, et al.
0

While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features' frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize gradient starvation where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features.

READ FULL TEXT
research
11/18/2020

Gradient Starvation: A Learning Proclivity in Neural Networks

We identify and formalize a fundamental gradient descent phenomenon resu...
research
10/30/2019

Generalization in multitask deep neural classifiers: a statistical physics approach

A proper understanding of the striking generalization abilities of deep ...
research
12/20/2013

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Despite the widespread practical success of deep learning methods, our t...
research
06/29/2018

Theory IIIb: Generalization in Deep Networks

A main puzzle of deep neural networks (DNNs) revolves around the apparen...
research
05/30/2018

The Dynamics of Learning: A Random Matrix Approach

Understanding the learning dynamics of neural networks is one of the key...
research
11/27/2019

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

A remarkable recent discovery in machine learning has been that deep neu...
research
12/12/2017

Transportation analysis of denoising autoencoders: a novel method for analyzing deep neural networks

The feature map obtained from the denoising autoencoder (DAE) is investi...

Please sign up or login with your details

Forgot password? Click here to reset