Thwarting finite difference adversarial attacks with output randomization

05/23/2019
by   Haidar Khan, et al.
0

Adversarial examples pose a threat to deep neural network models in a variety of scenarios, from settings where the adversary has complete knowledge of the model and to the opposite "black box" setting. Black box attacks are particularly threatening as the adversary only needs access to the input and output of the model. Defending against black box adversarial example generation attacks is paramount as currently proposed defenses are not effective. Since these types of attacks rely on repeated queries to the model to estimate gradients over input dimensions, we investigate the use of randomization to thwart such adversaries from successfully creating adversarial examples. Randomization applied to the output of the deep neural network model has the potential to confuse potential attackers, however this introduces a tradeoff between accuracy and robustness. We show that for certain types of randomization, we can bound the probability of introducing errors by carefully setting distributional parameters. For the particular case of finite difference black box attacks, we quantify the error introduced by the defense in the finite difference estimate of the gradient. Lastly, we show empirically that the defense can thwart two adaptive black box adversarial attack algorithms.

READ FULL TEXT
research
07/08/2021

Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models

Adversarial examples pose a threat to deep neural network models in a va...
research
07/12/2019

Stateful Detection of Black-Box Adversarial Attacks

The problem of adversarial examples, evasion attacks on machine learning...
research
06/10/2019

Improved Adversarial Robustness via Logit Regularization Methods

While great progress has been made at making neural networks effective a...
research
02/07/2021

SPADE: A Spectral Method for Black-Box Adversarial Robustness Evaluation

A black-box spectral method is introduced for evaluating the adversarial...
research
10/17/2021

Black-box Adversarial Attacks on Network-wide Multi-step Traffic State Prediction Models

Traffic state prediction is necessary for many Intelligent Transportatio...
research
04/22/2020

Neural Network Laundering: Removing Black-Box Backdoor Watermarks from Deep Neural Networks

Creating a state-of-the-art deep-learning system requires vast amounts o...
research
02/04/2021

PredCoin: Defense against Query-based Hard-label Attack

Many adversarial attacks and defenses have recently been proposed for De...

Please sign up or login with your details

Forgot password? Click here to reset