AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

05/08/2019
by   Jiong Zhang, et al.
28

Deep neural networks have yielded superior performance in many applications; however, the gradient computation in a deep model with millions of instances lead to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement in the current model by a stochastic gradient update on each instance varies dynamically. In AutoAssist, we utilize this fact and design a simple instance shrinking operation, which is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible. We prove that the proposed technique outperforms vanilla SGD with existing importance sampling approaches for linear SVM problems, and establish an O(1/k) convergence for strongly convex problems. In order to apply the proposed techniques to accelerate training of deep models, we propose to jointly train a very lightweight Assistant network in addition to the original deep network referred to as Boss. The Assistant network is designed to gauge the importance of a given instance with respect to the current Boss such that a shrinking operation can be applied in the batch generator. With careful design, we train the Boss and Assistant in a nonblocking and asynchronous fashion such that overhead is minimal. We demonstrate that AutoAssist reduces the number of epochs by 40 training a ResNet to reach the same test accuracy on an image classification data set and saves 30 the same BLEU scores on a translation dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2020

Contrastive Weight Regularization for Large Minibatch SGD

The minibatch stochastic gradient descent method (SGD) is widely applied...
research
03/29/2023

Importance Sampling for Stochastic Gradient Descent in Deep Neural Networks

Stochastic gradient descent samples uniformly the training set to build ...
research
09/13/2017

Normalized Direction-preserving Adam

Optimization algorithms for training deep models not only affects the co...
research
05/14/2019

Robust Neural Network Training using Periodic Sampling over Model Weights

Deep neural networks provide best-in-class performance for a number of c...
research
10/02/2019

Accelerating Deep Learning by Focusing on the Biggest Losers

This paper introduces Selective-Backprop, a technique that accelerates t...
research
06/11/2018

Gear Training: A new way to implement high-performance model-parallel training

The training of Deep Neural Networks usually needs tremendous computing ...
research
11/16/2019

Selective sampling for accelerating training of deep neural networks

We present a selective sampling method designed to accelerate the traini...

Please sign up or login with your details

Forgot password? Click here to reset