Layer-Specific Adaptive Learning Rates for Deep Networks

10/15/2015
by   Bharat Singh, et al.
0

The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely large for weights connecting deep layers (layers near the output layer), and extremely small for shallow layers (near the input layer); this results in slow learning in the shallow layers. Additionally, it has also been shown that in highly non-convex problems, such as deep neural networks, there is a proliferation of high-error low curvature saddle points, which slows down learning dramatically. In this paper, we attempt to overcome the two above problems by proposing an optimization method for training deep neural networks which uses learning rates which are both specific to each layer in the network and adaptive to the curvature of the function, increasing the learning rate at low curvature points. This enables us to speed up learning in the shallow layers of the network and quickly escape high-error low curvature saddle points. We test our method on standard image classification datasets such as MNIST, CIFAR10 and ImageNet, and demonstrate that our method increases accuracy as well as reduces the required training time over standard algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2022

Flatten the Curve: Efficiently Training Low-Curvature Neural Networks

The highly non-linear nature of deep neural networks causes them to be s...
research
12/27/2022

Langevin algorithms for very deep Neural Networks with application to image classification

Training a very deep neural network is a challenging task, as the deeper...
research
06/05/2018

On layer-level control of DNN training and its impact on generalization

The generalization ability of a neural network depends on the optimizati...
research
06/08/2017

Forward Thinking: Building and Training Neural Networks One Layer at a Time

We present a general framework for training deep neural networks without...
research
06/13/2017

Deep Control - a simple automatic gain control for memory efficient and high performance training of deep convolutional neural networks

Training a deep convolutional neural net typically starts with a random ...
research
03/30/2016

Deep Networks with Stochastic Depth

Very deep convolutional networks with hundreds of layers have led to sig...

Please sign up or login with your details

Forgot password? Click here to reset