On Frank-Wolfe Optimization for Adversarial Robustness and Interpretability

12/22/2020
by   Theodoros Tsiligkaridis, et al.
0

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss and is widely regarded as the most effective defense against such attacks. While projected gradient descent (PGD) has received most attention for approximately solving the inner maximization of AT, Frank-Wolfe (FW) optimization is projection-free and can be adapted to any L^p norm. A Frank-Wolfe adversarial training approach is presented and is shown to provide as competitive level of robustness as PGD-AT without much tuning for a variety of architectures. We empirically show that robustness is strongly connected to the L^2 magnitude of the adversarial perturbation and that more locally linear loss landscapes tend to have larger L^2 distortions despite having the same L^∞ distortion. We provide theoretical guarantees on the magnitude of the distortion for FW that depend on local geometry which FW-AT exploits. It is empirically shown that FW-AT achieves strong robustness to white-box attacks and black-box attacks and offers improved resistance to gradient masking. Further, FW-AT allows networks to learn high-quality human-interpretable features which are then used to generate counterfactual explanations to model predictions by using dense and sparse adversarial perturbations.

READ FULL TEXT

page 8

page 11

research
09/10/2020

Second Order Optimization for Adversarial Robustness and Interpretability

Deep neural networks are easily fooled by small perturbations known as a...
research
09/11/2019

Sparse and Imperceivable Adversarial Attacks

Neural networks have been proven to be vulnerable to a variety of advers...
research
12/23/2021

Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization

Adversarial training (AT) has become a widely recognized defense mechani...
research
11/24/2020

Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning

Prediction credibility measures, in the form of confidence intervals or ...
research
03/23/2023

Optimization and Optimizers for Adversarial Robustness

Empirical robustness evaluation (RE) of deep learning models against adv...
research
05/02/2019

You Only Propagate Once: Painless Adversarial Training Using Maximal Principle

Deep learning achieves state-of-the-art results in many areas. However r...
research
05/02/2019

You Only Propagate Once: Accelerating Adversarial Training Using Maximal Principle

Deep learning achieves state-of-the-art results in many areas. However r...

Please sign up or login with your details

Forgot password? Click here to reset