Large batch size training of neural networks with adversarial training and second-order information

10/02/2018
by   Zhewei Yao, et al.
2

Stochastic Gradient Descent (SGD) methods using randomly selected batches are widely-used to train neural network (NN) models. Performing design exploration to find the best NN for a particular task often requires extensive training with different models on a large dataset, which is very computationally expensive. The most straightforward method to accelerate this computation is to distribute the batch of SGD over multiple processors. To keep the distributed processors fully utilized requires commensurately growing the batch size; however, large batch training often times leads to degradation in accuracy, poor generalization, and even poor robustness to adversarial attacks. Existing solutions for large batch training either significantly degrade accuracy or require massive hyper-parameter tuning. To address this issue, we propose a novel large batch training method which combines recent results in adversarial training (to regularize against `sharp minima') and second order optimization (to use curvature information to change batch size adaptively during training). We extensively evaluate our method on Cifar-10/100, SVHN, TinyImageNet, and ImageNet datasets, using multiple NNs, including residual networks as well as smaller networks for mobile applications such as SqueezeNext. Our new approach exceeds the performance of the existing solutions in terms of both accuracy and the number of SGD iterations (up to 1% and 5×, respectively). We emphasize that this is achieved without any additional hyper-parameter tuning to tailor our proposed method in any of these experiments.

READ FULL TEXT
research
07/28/2020

Stochastic Normalized Gradient Descent with Momentum for Large Batch Training

Stochastic gradient descent (SGD) and its variants have been the dominat...
research
06/01/2021

Concurrent Adversarial Learning for Large-Batch Training

Large-batch training has become a commonly used technique when training ...
research
06/20/2020

How do SGD hyperparameters in natural training affect adversarial robustness?

Learning rate, batch size and momentum are three important hyperparamete...
research
03/14/2019

Inefficiency of K-FAC for Large Batch Size Training

In stochastic optimization, large batch training can leverage parallel r...
research
02/22/2018

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Large batch size training of Neural Networks has been shown to incur acc...
research
12/16/2020

Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient

Large batch size training in deep neural networks (DNNs) possesses a wel...
research
06/10/2020

Extrapolation for Large-batch Training in Deep Learning

Deep learning networks are typically trained by Stochastic Gradient Desc...

Please sign up or login with your details

Forgot password? Click here to reset