Perception-in-the-Loop Adversarial Examples

01/21/2019
by   Mahmoud Salamati, et al.
6

We present a scalable, black box, perception-in-the-loop technique to find adversarial examples for deep neural network classifiers. Black box means that our procedure only has input-output access to the classifier, and not to the internal structure, parameters, or intermediate confidence values. Perception-in-the-loop means that the notion of proximity between inputs can be directly queried from human participants rather than an arbitrarily chosen metric. Our technique is based on covariance matrix adaptation evolution strategy (CMA-ES), a black box optimization approach. CMA-ES explores the search space iteratively in a black box manner, by generating populations of candidates according to a distribution, choosing the best candidates according to a cost function, and updating the posterior distribution to favor the best candidates. We run CMA-ES using human participants to provide the fitness function, using the insight that the choice of best candidates in CMA-ES can be naturally modeled as a perception task: pick the top k inputs perceptually closest to a fixed input. We empirically demonstrate that finding adversarial examples is feasible using small populations and few iterations. We compare the performance of CMA-ES on the MNIST benchmark with other black-box approaches using L_p norms as a cost function, and show that it performs favorably both in terms of success in finding adversarial examples and in minimizing the distance between the original and the adversarial input. In experiments on the MNIST, CIFAR10, and GTSRB benchmarks, we demonstrate that CMA-ES can find perceptually similar adversarial inputs with a small number of iterations and small population sizes when using perception-in-the-loop. Finally, we show that networks trained specifically to be robust against L_∞ norm can still be susceptible to perceptually similar adversarial examples.

READ FULL TEXT

page 1

page 7

page 8

page 10

page 12

page 13

research
08/19/2019

Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries

In a black-box setting, the adversary only has API access to the target ...
research
09/17/2019

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

Recently, generating adversarial examples has become an important means ...
research
07/10/2020

Generating Adversarial Inputs Using A Black-box Differential Technique

Neural Networks (NNs) are known to be vulnerable to adversarial attacks....
research
06/14/2019

Copy and Paste: A Simple But Effective Initialization Method for Black-Box Adversarial Attacks

Many optimization methods for generating black-box adversarial examples ...
research
09/08/2018

Structure-Preserving Transformation: Generating Diverse and Transferable Adversarial Examples

Adversarial examples are perturbed inputs designed to fool machine learn...
research
04/24/2018

Opening the black box of neural nets: case studies in stop/top discrimination

We introduce techniques for exploring the functionality of a neural netw...
research
02/27/2018

On the Suitability of L_p-norms for Creating and Preventing Adversarial Examples

Much research effort has been devoted to better understanding adversaria...

Please sign up or login with your details

Forgot password? Click here to reset