On the Convergence Rate of Training Recurrent Neural Networks

10/29/2018
by   Zeyuan Allen-Zhu, et al.
0

Despite the huge success of deep learning, our understanding to how the non-convex neural networks are trained remains rather limited. Most of existing theoretical works only tackle neural networks with one hidden layer, and little is known for multi-layer neural networks. Recurrent neural networks (RNNs) are special multi-layer networks extensively used in natural language processing applications. They are particularly hard to analyze, comparing to feedforward networks, because the weight parameters are reused across the entire time horizon. We provide arguably the first theoretical understanding to the convergence speed of training RNNs. Specifically, when the number of neurons is sufficiently large ---meaning polynomial in the training data size and in the time horizon--- and when the weights are randomly initialized, we show that gradient descent and stochastic gradient descent both minimize the training loss in a linear convergence rate, that is, ε∝ e^-Ω(T).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2018

A Convergence Theory for Deep Learning via Over-Parameterization

Deep neural networks (DNNs) have demonstrated dominating performance in ...
research
08/17/2022

Learning with Local Gradients at the Edge

To enable learning on edge devices with fast convergence and low memory,...
research
10/28/2016

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

The optimization problem behind neural networks is highly non-convex. Tr...
research
05/21/2022

Symmetry Teleportation for Accelerated Optimization

Existing gradient-based optimization methods update the parameters local...
research
08/08/2022

Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing

We present a weight similarity measure method that can quantify the weig...
research
12/05/2020

A Review of Designs and Applications of Echo State Networks

Recurrent Neural Networks (RNNs) have demonstrated their outstanding abi...
research
12/07/2017

CNNs are Globally Optimal Given Multi-Layer Support

Stochastic Gradient Descent (SGD) is the central workhorse for training ...

Please sign up or login with your details

Forgot password? Click here to reset