Accelerated CNN Training Through Gradient Approximation

08/15/2019
by   Ziheng Wang, et al.
0

Training deep convolutional neural networks such as VGG and ResNet by gradient descent is an expensive exercise requiring specialized hardware such as GPUs. Recent works have examined the possibility of approximating the gradient computation while maintaining the same convergence properties. While promising, the approximations only work on relatively small datasets such as MNIST. They also fail to achieve real wall-clock speedups due to lack of efficient GPU implementations of the proposed approximation methods. In this work, we explore three alternative methods to approximate gradients, with an efficient GPU kernel implementation for one of them. We achieve wall-clock speedup with ResNet-20 and VGG-19 on the CIFAR-10 dataset upwards of 7 minimal loss in validation accuracy.

READ FULL TEXT
research
12/20/2013

Multi-GPU Training of ConvNets

In this work we evaluate different approaches to parallelize computation...
research
03/05/2023

Multiplexed gradient descent: Fast online training of modern datasets on hardware neural networks without backpropagation

We present multiplexed gradient descent (MGD), a gradient descent framew...
research
07/01/2020

Convolutional Neural Network Training with Distributed K-FAC

Training neural networks with many processors can reduce time-to-solutio...
research
06/07/2021

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

This work proposes a time-efficient Natural Gradient Descent method, cal...
research
10/24/2018

Learned optimizers that outperform SGD on wall-clock and validation loss

Deep learning has shown that learned functions can dramatically outperfo...
research
10/24/2018

Learned optimizers that outperform SGD on wall-clock and test loss

Deep learning has shown that learned functions can dramatically outperfo...
research
11/30/2018

On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent

Increasing the mini-batch size for stochastic gradient descent offers si...

Please sign up or login with your details

Forgot password? Click here to reset