Improving the Backpropagation Algorithm with Consequentialism Weight Updates over Mini-Batches

03/11/2020
by   Naeem Paeedeh, et al.
0

Least mean squares (LMS) is a particular case of the backpropagation (BP) algorithm applied to single-layer neural networks with the mean squared error (MSE) loss. One drawback of the LMS is that the instantaneous weight update is proportional to the square of the norm of the input vector. Normalized least mean squares (NLMS) algorithm amends this drawback by dividing the weight changes by the square of the norm of the input vector. The affine projection algorithm (APA) improved the NLMS algorithm to weight update over a batch of recently seen samples. However, the application of NLMS and APA had been limited to single-layer networks and adaptive filters. In this paper, we consider a virtual target for each neuron of a multi-layer neural network and show that the BP algorithm is equivalent to training the weights of each layer using these virtual targets and the LMS algorithm. We also introduce a consequentialism interpretation of the NLMS and the APA algorithms that justifies their use in multi-layer neural networks. Given any optimization algorithm based on the BP over mini-batches, we propose a novel consequentialism method for updating the weights.Consequently, our proposed weight update can be applied both to plain stochastic gradient descent (SGD) and to momentum methods like RMSProp, Adam, and NAG. These ideas helped us to update the weights more carefully in such a way that minimization of the loss for one sample of the mini-batch does not interfere with other samples in that mini-batch. Our experiments show the usefulness of the proposed method in optimizing deep neural network architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2022

Learning with Local Gradients at the Edge

To enable learning on edge devices with fast convergence and low memory,...
research
01/13/2022

Recursive Least Squares Policy Control with Echo State Network

The echo state network (ESN) is a special type of recurrent neural netwo...
research
05/12/2020

RSO: A Gradient Free Sampling Based Approach For Training Deep Neural Networks

We propose RSO (random search optimization), a gradient free Markov Chai...
research
06/15/2020

Slowing Down the Weight Norm Increase in Momentum-based Optimizers

Normalization techniques, such as batch normalization (BN), have led to ...
research
06/10/2021

Front Contribution instead of Back Propagation

Deep Learning's outstanding track record across several domains has stem...
research
02/01/2022

Deep Layer-wise Networks Have Closed-Form Weights

There is currently a debate within the neuroscience community over the l...
research
01/12/2019

Enhancing Explainability of Neural Networks through Architecture Constraints

Prediction accuracy and model explainability are the two most important ...

Please sign up or login with your details

Forgot password? Click here to reset