MORA: Improving Ensemble Robustness Evaluation with Model-Reweighing Attack

11/15/2022
by   Yunrui Yu, et al.
0

Adversarial attacks can deceive neural networks by adding tiny perturbations to their input data. Ensemble defenses, which are trained to minimize attack transferability among sub-models, offer a promising research direction to improve robustness against such attacks while maintaining a high accuracy on natural inputs. We discover, however, that recent state-of-the-art (SOTA) adversarial attack strategies cannot reliably evaluate ensemble defenses, sizeably overestimating their robustness. This paper identifies the two factors that contribute to this behavior. First, these defenses form ensembles that are notably difficult for existing gradient-based method to attack, due to gradient obfuscation. Second, ensemble defenses diversify sub-model gradients, presenting a challenge to defeat all sub-models simultaneously, simply summing their contributions may counteract the overall attack objective; yet, we observe that ensemble may still be fooled despite most sub-models being correct. We therefore introduce MORA, a model-reweighing attack to steer adversarial example synthesis by reweighing the importance of sub-model gradients. MORA finds that recent ensemble defenses all exhibit varying degrees of overestimated robustness. Comparing it against recent SOTA white-box attacks, it can converge orders of magnitude faster while achieving higher attack success rates across all ensemble models examined with three different ensemble modes (i.e., ensembling by either softmax, voting or logits). In particular, most ensemble defenses exhibit near or exactly 0 against MORA with ℓ^∞ perturbation within 0.02 on CIFAR-10, and 0.01 on CIFAR-100. We make MORA open source with reproducible results and pre-trained models; and provide a leaderboard of ensemble defenses under various attack strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2021

Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks

Adversarial attacks optimize against models to defeat defenses. Existing...
research
03/03/2020

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

The field of defense strategies against adversarial attacks has signific...
research
05/20/2018

Improving Adversarial Robustness by Data-Specific Discretization

A recent line of research proposed (either implicitly or explicitly) gra...
research
11/28/2020

Voting based ensemble improves robustness of defensive models

Developing robust models against adversarial perturbations has been an a...
research
12/15/2022

Alternating Objectives Generates Stronger PGD-Based Adversarial Attacks

Designing powerful adversarial attacks is of paramount importance for th...
research
10/19/2020

RobustBench: a standardized adversarial robustness benchmark

Evaluation of adversarial robustness is often error-prone leading to ove...
research
11/23/2022

Reliable Robustness Evaluation via Automatically Constructed Attack Ensembles

Attack Ensemble (AE), which combines multiple attacks together, provides...

Please sign up or login with your details

Forgot password? Click here to reset