DeepAI AI Chat
Log In Sign Up

Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation

02/27/2018
by   Huishuai Zhang, et al.
Microsoft
0

Stochastic gradient descent (SGD) has achieved great success in training deep neural network, where the gradient is computed through back-propagation. However, the back-propagated values of different layers vary dramatically. This inconsistence of gradient magnitude across different layers renders optimization of deep neural network with a single learning rate problematic. We introduce the back-matching propagation which computes the backward values on the layer's parameter and the input by matching backward values on the layer's output. This leads to solving a bunch of least-squares problems, which requires high computational cost. We then reduce the back-matching propagation with approximations and propose an algorithm that turns to be the regular SGD with a layer-wise adaptive learning rate strategy. This allows an easy implementation of our algorithm in current machine learning frameworks equipped with auto-differentiation. We apply our algorithm in training modern deep neural networks and achieve favorable results over SGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/19/2018

Deep Frank-Wolfe For Neural Network Optimization

Learning a deep neural network requires solving a challenging optimizati...
06/05/2018

On layer-level control of DNN training and its impact on generalization

The generalization ability of a neural network depends on the optimizati...
05/27/2019

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

We propose NovoGrad, a first-order stochastic gradient method with layer...
12/31/2019

AdderNet: Do We Really Need Multiplications in Deep Learning?

Compared with cheap addition operation, multiplication operation is of m...
02/13/2022

Reverse Back Propagation to Make Full Use of Derivative

The development of the back-propagation algorithm represents a landmark ...
03/31/2022

Exploiting Explainable Metrics for Augmented SGD

Explaining the generalization characteristics of deep learning is an eme...
02/26/2021

Experiments with Rich Regime Training for Deep Learning

In spite of advances in understanding lazy training, recent work attribu...