On Detecting Adversarial Perturbations

02/14/2017
by   Jan-Hendrik Metzen, et al.
0

Machine learning and deep learning in particular has advanced tremendously on perceptual tasks in recent years. However, it remains vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to a human. In this work, we propose to augment deep neural networks with a small "detector" subnetwork which is trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. Our method is orthogonal to prior work on addressing adversarial perturbations, which has mostly focused on making the classification network itself more robust. We show empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans. Moreover, while the detectors have been trained to detect only a specific adversary, they generalize to similar and weaker adversaries. In addition, we propose an adversarial attack that fools both the classifier and the detector and a novel training procedure for the detector that counteracts this attack.

READ FULL TEXT

page 7

page 9

research
12/08/2020

Locally optimal detection of stochastic targeted universal adversarial perturbations

Deep learning image classifiers are known to be vulnerable to small adve...
research
03/24/2020

Adversarial Perturbations Fool Deepfake Detectors

This work uses adversarial perturbations to enhance deepfake images and ...
research
06/19/2018

Built-in Vulnerabilities to Imperceptible Adversarial Perturbations

Designing models that are robust to small adversarial perturbations of t...
research
10/29/2020

Robustifying Binary Classification to Adversarial Perturbation

Despite the enormous success of machine learning models in various appli...
research
03/23/2018

Detecting Adversarial Perturbations with Saliency

In this paper we propose a novel method for detecting adversarial exampl...
research
07/26/2018

A general metric for identifying adversarial images

It is well known that a determined adversary can fool a neural network b...
research
05/18/2023

Towards an Accurate and Secure Detector against Adversarial Perturbations

The vulnerability of deep neural networks to adversarial perturbations h...

Please sign up or login with your details

Forgot password? Click here to reset