Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

02/22/2018
by   Zhewei Yao, et al.
0

Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The precise underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze how the landscape of the loss functional is different for large batch size training. We compute the true Hessian spectrum, without approximation, by back-propagating the second derivative. Our results on multiple networks show that, when training at large batch sizes, one tends to stop at points in the parameter space with noticeably higher/larger Hessian spectrum, i.e., where the eigenvalues of the Hessian are much larger. We then study how batch size affects robustness of the model in the face of adversarial attacks. All the results show that models trained with large batches are more susceptible to adversarial attacks, as compared to models trained with small batch sizes. Furthermore, we prove a theoretical result which shows that the problem of finding an adversarial perturbation is a saddle-free optimization problem. Finally, we show empirical results that demonstrate that adversarial training leads to areas with smaller Hessian spectrum. We present detailed experiments with five different network architectures tested on MNIST, CIFAR-10, and CIFAR-100 datasets.

READ FULL TEXT
research
06/20/2020

How do SGD hyperparameters in natural training affect adversarial robustness?

Learning rate, batch size and momentum are three important hyperparamete...
research
12/16/2019

PyHessian: Neural Networks Through the Lens of the Hessian

We present PyHessian, a new scalable framework that enables fast computa...
research
06/16/2020

Flatness is a False Friend

Hessian based measures of flatness, such as the trace, Frobenius and spe...
research
10/02/2018

Large batch size training of neural networks with adversarial training and second-order information

Stochastic Gradient Descent (SGD) methods using randomly selected batche...
research
01/29/2019

An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

To understand the dynamics of optimization in deep neural networks, we d...
research
01/31/2022

On the Power-Law Spectrum in Deep Learning: A Bridge to Protein Science

It is well-known that the Hessian matters to optimization, generalizatio...
research
11/16/2018

The Full Spectrum of Deep Net Hessians At Scale: Dynamics with Sample Size

Previous works observed the spectrum of the Hessian of the training loss...

Please sign up or login with your details

Forgot password? Click here to reset