meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

06/19/2017
by   Xu Sun, et al.
0

We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-k elements (in terms of magnitude) are kept. As a result, only k rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction (k divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1--4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/jklj077/meProp

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2017

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

We propose a simple yet effective technique to simplify the training and...
research
05/26/2023

XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

In this paper, we propose a general deep learning training framework XGr...
research
09/18/2017

Minimal Effort Back Propagation for Convolutional Neural Networks

As traditional neural network consumes a significant amount of computing...
research
04/20/2022

Does Interference Exist When Training a Once-For-All Network?

The Once-For-All (OFA) method offers an excellent pathway to deploy a tr...
research
06/15/2020

Slowing Down the Weight Norm Increase in Momentum-based Optimizers

Normalization techniques, such as batch normalization (BN), have led to ...
research
10/19/2018

Gradient target propagation

We report a learning rule for neural networks that computes how much eac...
research
06/09/2023

Error Feedback Can Accurately Compress Preconditioners

Leveraging second-order information at the scale of deep networks is one...

Please sign up or login with your details

Forgot password? Click here to reset