Adversarial Example Games

by   Avishek Joey Bose, et al.

The existence of adversarial examples capable of fooling trained neural network classifiers calls for a much better understanding of possible attacks, in order to guide the development of safeguards against them. It includes attack methods in the highly challenging non-interactive blackbox setting, where adversarial attacks are generated without any access, including queries, to the target model. Prior works in this setting have relied mainly on algorithmic innovations derived from empirical observations (e.g., that momentum helps), and the field currently lacks a firm theoretical basis for understanding transferability in adversarial attacks. In this work, we address this gap and lay the theoretical foundations for crafting transferable adversarial examples to entire function classes. We introduce Adversarial Examples Games (AEG), a novel framework that models adversarial examples as two-player min-max games between an attack generator and a representative classifier. We prove that the saddle point of an AEG game corresponds to a generating distribution of adversarial examples against entire function classes. Training the generator only requires the ability to optimize a representative classifier from a given hypothesis class, enabling BlackBox transfer to unseen classifiers from the same class. We demonstrate the efficacy of our approach on the MNIST and CIFAR-10 datasets against both undefended and robustified models, achieving competitive performance with state-of-the-art BlackBox transfer approaches.


On the reversibility of adversarial attacks

Adversarial attacks modify images with perturbations that change the pre...

Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates

To deflect adversarial attacks, a range of "certified" classifiers have ...

Generating Adversarial Examples with Graph Neural Networks

Recent years have witnessed the deployment of adversarial attacks to eva...

Adversarial Examples for Cost-Sensitive Classifiers

Motivated by safety-critical classification problems, we investigate adv...

A Little Robustness Goes a Long Way: Leveraging Universal Features for Targeted Transfer Attacks

Adversarial examples for neural network image classifiers are known to b...

CycleGAN: a Master of Steganography

CycleGAN is one of the latest successful approaches to learn a correspon...

Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers

An adversarial attack paradigm explores various scenarios for the vulner...

Code Repositories


Code for Adversarial Example Games NeurIPS 2020 Paper

view repo