Adversarial Detection and Correction by Matching Prediction Distributions

02/21/2020
by   Giovanni Vacanti, et al.
19

We present a novel adversarial detection and correction method for machine learning classifiers.The detector consists of an autoencoder trained with a custom loss function based on the Kullback-Leibler divergence between the classifier predictions on the original and reconstructed instances.The method is unsupervised, easy to train and does not require any knowledge about the underlying attack. The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST, and remains very effective on CIFAR-10 when the attack is granted full access to the classification model but not the defence. We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence and investigate the robustness of the attack. The method is very flexible and can also be used to detect common data corruptions and perturbations which negatively impact the model performance. We illustrate this capability on the CIFAR-10-C dataset.

READ FULL TEXT

page 3

page 5

page 6

page 11

page 12

page 13

research
07/28/2019

Are Odds Really Odd? Bypassing Statistical Detection of Adversarial Examples

Deep learning classifiers are known to be vulnerable to adversarial exam...
research
02/05/2018

Blind Pre-Processing: A Robust Defense Method Against Adversarial Examples

Deep learning algorithms and networks are vulnerable to perturbed inputs...
research
08/09/2021

Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks

Poisoning attacks are a category of adversarial machine learning threats...
research
08/06/2019

MetaAdvDet: Towards Robust Detection of Evolving Adversarial Attacks

Deep neural networks (DNNs) are vulnerable to adversarial attack which i...
research
08/04/2019

A Restricted Black-box Adversarial Framework Towards Attacking Graph Embedding Models

With the great success of graph embedding model on both academic and ind...
research
12/28/2021

Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks

Minimal adversarial perturbations added to inputs have been shown to be ...
research
10/21/2019

An Alternative Surrogate Loss for PGD-based Adversarial Testing

Adversarial testing methods based on Projected Gradient Descent (PGD) ar...

Please sign up or login with your details

Forgot password? Click here to reset