Global Convergence and Geometric Characterization of Slow to Fast Weight Evolution in Neural Network Training for Classifying Linearly Non-Separable Data

02/28/2020
by   Ziang Long, et al.
0

In this paper, we study the dynamics of gradient descent in learning neural networks for classification problems. Unlike in existing works, we consider the linearly non-separable case where the training data of different classes lie in orthogonal subspaces. We show that when the network has sufficient (but not exceedingly large) number of neurons, (1) the corresponding minimization problem has a desirable landscape where all critical points are global minima with perfect classification; (2) gradient descent is guaranteed to converge to the global minima in this case. Moreover, we discovered a geometric condition on the network weights so that when it is satisfied, the weight evolution transitions from a slow phase of weight direction spreading to a fast phase of weight convergence. The geometric condition says that the convex hull of the weights projected on the unit sphere contains the origin.

READ FULL TEXT
research
11/08/2018

A Geometric Approach of Gradient Descent Algorithms in Neural Networks

In this article we present a geometric framework to analyze convergence ...
research
12/03/2017

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima

We consider the problem of learning a one-hidden-layer neural network wi...
research
02/12/2019

Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

Many modern neural network architectures are trained in an overparameter...
research
10/11/2021

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Understanding the training dynamics of deep learning models is perhaps a...
research
12/03/2019

Stationary Points of Shallow Neural Networks with Quadratic Activation Function

We consider the problem of learning shallow neural networks with quadrat...
research
05/10/2017

Learning ReLUs via Gradient Descent

In this paper we study the problem of learning Rectified Linear Units (R...
research
07/16/2020

Data-driven effective model shows a liquid-like deep learning

Geometric structure of an optimization landscape is argued to be fundame...

Please sign up or login with your details

Forgot password? Click here to reset