A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

05/20/2023
by   Arunselvan Ramaswamy, et al.
0

We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios. This algorithm combines the standard stochastic gradient descent and the gradient clipping method. The output layer is updated using clipped gradients, the rest of the neural network is updated using standard gradients. Updating the output layer using clipped gradient stabilizes it. We show that the remaining layers are automatically stabilized provided the neural network is only composed of squashing (compact range) activations. We also present a novel squashing activation function - it is obtained by modifying a Gaussian Error Linear Unit (GELU) to have compact range - we call it Truncated GELU (tGELU). Unlike other squashing activations, such as sigmoid, the range of tGELU can be explicitly specified. As a consequence, the problem of vanishing gradients that arise due to a small range, e.g., in the case of a sigmoid activation, is eliminated. We prove that a NN composed of squashing activations (tGELU, sigmoid, etc.), when updated using the algorithm presented herein, is numerically stable and has consistent performance (low variance). The theory is supported by extensive experiments. Within reinforcement learning, as a consequence of our study, we show that target networks in Deep Q-Learning can be omitted, greatly speeding up learning and alleviating memory requirements. Cross-entropy based classification algorithms that suffer from high variance issues are more consistent when trained using our framework. One symptom of numerical instability in training is the high variance of the neural network update values. We show, in theory and through experiments, that our algorithm updates have low variance, and the training loss reduces in a smooth manner.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2016

Parametric Exponential Linear Unit for Deep Convolutional Neural Networks

The activation function is an important component in Convolutional Neura...
research
08/17/2022

Learning with Local Gradients at the Edge

To enable learning on edge devices with fast convergence and low memory,...
research
11/05/2018

Lifted Proximal Operator Machines

We propose a new optimization method for training feed-forward neural ne...
research
11/23/2022

Learning Compact Features via In-Training Representation Alignment

Deep neural networks (DNNs) for supervised learning can be viewed as a p...
research
07/25/2020

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient

Deep Q-learning algorithms often suffer from poor gradient estimations w...
research
06/08/2017

Self-Normalizing Neural Networks

Deep Learning has revolutionized vision via convolutional neural network...
research
06/13/2017

Deep Control - a simple automatic gain control for memory efficient and high performance training of deep convolutional neural networks

Training a deep convolutional neural net typically starts with a random ...

Please sign up or login with your details

Forgot password? Click here to reset