Second Order Optimization for Adversarial Robustness and Interpretability

09/10/2020
by   Theodoros Tsiligkaridis, et al.
0

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique aimed at learning features robust to such attacks and is widely regarded as a very effective defense. However, the computational cost of such training can be prohibitive as the network size and input dimensions grow. Inspired by the relationship between robustness and curvature, we propose a novel regularizer which incorporates first and second order information via a quadratic approximation to the adversarial loss. The worst case quadratic loss is approximated via an iterative scheme. It is shown that using only a single iteration in our regularizer achieves stronger robustness than prior gradient and curvature regularization schemes, avoids gradient obfuscation, and, with additional iterations, achieves strong robustness with significantly lower training time than AT. Further, it retains the interesting facet of AT that networks learn features which are well-aligned with human perception. We demonstrate experimentally that our method produces higher quality human-interpretable features than other geometric regularization techniques. These robust features are then used to provide human-friendly explanations to model predictions.

READ FULL TEXT

page 6

page 7

research
04/04/2020

Adversarial Robustness through Regularization: A Second-Order Approach

Adversarial training is a common approach to improving the robustness of...
research
12/22/2020

On Frank-Wolfe Optimization for Adversarial Robustness and Interpretability

Deep neural networks are easily fooled by small perturbations known as a...
research
05/22/2018

Adversarially Robust Training through Structured Gradient Regularization

We propose a novel data-dependent structured gradient regularizer to inc...
research
11/23/2018

Robustness via curvature regularization, and vice versa

State-of-the-art classifiers have been shown to be largely vulnerable to...
research
05/27/2019

Scaleable input gradient regularization for adversarial robustness

Input gradient regularization is not thought to be an effective means fo...
research
11/22/2022

Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization

Recent research in robust optimization has shown an overfitting-like phe...
research
06/09/2023

Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions

Learning in MDPs with highly complex state representations is currently ...

Please sign up or login with your details

Forgot password? Click here to reset