Tricking Adversarial Attacks To Fail

06/08/2020
by   Blerta Lindqvist, et al.
0

Recent adversarial defense approaches have failed. Untargeted gradient-based attacks cause classifiers to choose any wrong class. Our novel white-box defense tricks untargeted attacks into becoming attacks targeted at designated target classes. From these target classes, we can derive the real classes. Our Target Training defense tricks the minimization at the core of untargeted, gradient-based adversarial attacks: minimize the sum of (1) perturbation and (2) classifier adversarial loss. Target Training changes the classifier minimally, and trains it with additional duplicated points (at 0 distance) labeled with designated classes. These differently-labeled duplicated samples minimize both terms (1) and (2) of the minimization, steering attack convergence to samples of designated classes, from which correct classification is derived. Importantly, Target Training eliminates the need to know the attack and the overhead of generating adversarial samples of attacks that minimize perturbations. We obtain an 86.2 exceeding even unsecured classifier accuracy on non-adversarial samples. Target Training presents a fundamental change in adversarial defense strategy.

READ FULL TEXT

page 5

page 13

research
02/09/2021

Target Training Does Adversarial Training Without Adversarial Samples

Neural network classifiers are vulnerable to misclassification of advers...
research
02/04/2020

Minimax Defense against Gradient-based Adversarial Attacks

State-of-the-art adversarial attacks are aimed at neural network classif...
research
05/28/2021

A BIC based Mixture Model Defense against Data Poisoning Attacks on Classifiers

Data Poisoning (DP) is an effective attack that causes trained classifie...
research
10/08/2022

Symmetry Subgroup Defense Against Adversarial Attacks

Adversarial attacks and defenses disregard the lack of invariance of con...
research
07/31/2020

Class-Oriented Poisoning Attack

Poisoning attacks on machine learning systems compromise the model perfo...
research
09/19/2019

Propagated Perturbation of Adversarial Attack for well-known CNNs: Empirical Study and its Explanation

Deep Neural Network based classifiers are known to be vulnerable to pert...
research
05/14/2019

Robustification of deep net classifiers by key based diversified aggregation with pre-filtering

In this paper, we address a problem of machine learning system vulnerabi...

Please sign up or login with your details

Forgot password? Click here to reset