Learning with Local Gradients at the Edge

08/17/2022
by   Michael Lomnitz, et al.
0

To enable learning on edge devices with fast convergence and low memory, we present a novel backpropagation-free optimization algorithm dubbed Target Projection Stochastic Gradient Descent (tpSGD). tpSGD generalizes direct random target projection to work with arbitrary loss functions and extends target projection for training recurrent neural networks (RNNs) in addition to feedforward networks. tpSGD uses layer-wise stochastic gradient descent (SGD) and local targets generated via random projections of the labels to train the network layer-by-layer with only forward passes. tpSGD doesn't require retaining gradients during optimization, greatly reducing memory allocation compared to SGD backpropagation (BP) methods that require multiple instances of the entire neural network weights, input/output, and intermediate results. Our method performs comparably to BP gradient-descent within 5 relatively shallow networks of fully connected layers, convolutional layers, and recurrent layers. tpSGD also outperforms other state-of-the-art gradient-free algorithms in shallow models consisting of multi-layer perceptrons, convolutional neural networks (CNNs), and RNNs with competitive accuracy and less memory and time. We evaluate the performance of tpSGD in training deep neural networks (e.g. VGG) and extend the approach to multi-layer RNNs. These experiments highlight new research directions related to optimized layer-based adaptor training for domain-shift using tpSGD at the edge.

READ FULL TEXT
research
10/29/2018

On the Convergence Rate of Training Recurrent Neural Networks

Despite the huge success of deep learning, our understanding to how the ...
research
03/11/2020

Improving the Backpropagation Algorithm with Consequentialism Weight Updates over Mini-Batches

Least mean squares (LMS) is a particular case of the backpropagation (BP...
research
06/13/2021

Low-memory stochastic backpropagation with multi-channel randomized trace estimation

Thanks to the combination of state-of-the-art accelerators and highly op...
research
05/20/2023

A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

We present a novel algorithm for training deep neural networks in superv...
research
12/07/2017

CNNs are Globally Optimal Given Multi-Layer Support

Stochastic Gradient Descent (SGD) is the central workhorse for training ...
research
07/17/2020

Partial local entropy and anisotropy in deep weight spaces

We refine a recently-proposed class of local entropic loss functions by ...
research
06/24/2018

Beyond Backprop: Alternating Minimization with co-Activation Memory

We propose a novel online algorithm for training deep feedforward neural...

Please sign up or login with your details

Forgot password? Click here to reset