Revisiting Outer Optimization in Adversarial Training

09/02/2022
by   Ali Dabouei, et al.
0

Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the outer optimization in AT since the convergence rate of MSGD is highly dependent on the variance of the gradients. To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. We prove that the convergence rate of ENGM is independent of the variance of the gradients, and thus, it is suitable for AT. We introduce a trick to reduce the computational cost of ENGM using empirical observations on the correlation between the norm of gradients w.r.t. the network parameters and input examples. Our extensive evaluations and ablation studies on CIFAR-10, CIFAR-100, and TinyImageNet demonstrate that ENGM and its variants consistently improve the performance of a wide range of AT methods. Furthermore, ENGM alleviates major shortcomings of AT including robust overfitting and high sensitivity to hyperparameter settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2023

Convergence Analysis of Decentralized ASGD

Over the last decades, Stochastic Gradient Descent (SGD) has been intens...
research
10/12/2020

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

We propose a novel and efficient training method for RNNs by iteratively...
research
11/05/2015

Stop Wasting My Gradients: Practical SVRG

We present and analyze several strategies for improving the performance ...
research
07/09/2020

A Study of Gradient Variance in Deep Learning

The impact of gradient noise on training deep models is widely acknowled...
research
02/10/2023

Achieving acceleration despite very noisy gradients

We present a novel momentum-based first order optimization method (AGNES...
research
06/08/2019

Reducing the variance in online optimization by transporting past gradients

Most stochastic optimization methods use gradients once before discardin...
research
01/27/2020

Variance Reduction with Sparse Gradients

Variance reduction methods such as SVRG and SpiderBoost use a mixture of...

Please sign up or login with your details

Forgot password? Click here to reset