White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks

04/04/2019
by   Yotam Gil, et al.
6

Adversarial examples are important for understanding the behavior of neural models, and can improve their robustness through adversarial training. Recent work in natural language processing generated adversarial examples by assuming white-box access to the attacked model, and optimizing the input directly against it (Ebrahimi et al., 2018). In this work, we show that the knowledge implicit in the optimization procedure can be distilled into another more efficient neural network. We train a model to emulate the behavior of a white-box attack and show that it generalizes well across examples. Moreover, it reduces adversarial example generation time by 19x-39x. We also show that our approach transfers to a black-box setting, by attacking The Google Perspective API and exposing its vulnerability. Our attack flips the API-predicted label in 42% of the generated examples, while humans maintain high-accuracy in predicting the gold label.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2018

A note on hyperparameters in black-box adversarial examples

Since Biggio et al. (2013) and Szegedy et al. (2013) first drew attentio...
research
11/02/2020

The Vulnerability of the Neural Networks Against Adversarial Examples in Deep Learning Algorithms

With further development in the fields of computer vision, network secur...
research
08/19/2019

Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries

In a black-box setting, the adversary only has API access to the target ...
research
03/14/2018

Defensive Collaborative Multi-task Training - Defending against Adversarial Attack towards Deep Neural Networks

Deep neural network (DNNs) has shown impressive performance on hard perc...
research
05/01/2023

Attack-SAM: Towards Evaluating Adversarial Robustness of Segment Anything Model

Segment Anything Model (SAM) has attracted significant attention recentl...
research
03/24/2023

Effective black box adversarial attack with handcrafted kernels

We propose a new, simple framework for crafting adversarial examples for...
research
09/10/2019

Toward Finding The Global Optimal of Adversarial Examples

Current machine learning models are vulnerable to adversarial examples (...

Please sign up or login with your details

Forgot password? Click here to reset