Natural and Adversarial Error Detection using Invariance to Image Transformations

02/01/2019
by   Yuval Bahat, et al.
0

We propose an approach to distinguish between correct and incorrect image classifications. Our approach can detect misclassifications which either occur unintentionally ("natural errors"), or due to intentional adversarial attacks ("adversarial errors"), both in a single unified framework. Our approach is based on the observation that correctly classified images tend to exhibit robust and consistent classifications under certain image transformations (e.g., horizontal flip, small image translation, etc.). In contrast, incorrectly classified images (whether due to adversarial errors or natural errors) tend to exhibit large variations in classification results under such transformations. Our approach does not require any modifications or retraining of the classifier, hence can be applied to any pre-trained classifier. We further use state of the art targeted adversarial attacks to demonstrate that even when the adversary has full knowledge of our method, the adversarial distortion needed for bypassing our detector is no longer imperceptible to the human eye. Our approach obtains state-of-the-art results compared to previous adversarial detection methods, surpassing them by a large margin.

READ FULL TEXT

page 1

page 2

page 7

research
07/05/2023

GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations

Deep neural networks tend to make overconfident predictions and often re...
research
04/02/2018

Confidence from Invariance to Image Transformations

We develop a technique for automatically detecting the classification er...
research
01/04/2022

Towards Understanding and Harnessing the Effect of Image Transformation in Adversarial Detection

Deep neural networks (DNNs) are threatened by adversarial examples. Adve...
research
10/08/2022

Symmetry Subgroup Defense Against Adversarial Attacks

Adversarial attacks and defenses disregard the lack of invariance of con...
research
02/28/2020

Detecting Patch Adversarial Attacks with Image Residuals

We introduce an adversarial sample detection algorithm based on image re...
research
06/17/2020

Adversarial Defense by Latent Style Transformations

Machine learning models have demonstrated vulnerability to adversarial a...
research
09/12/2023

Using Reed-Muller Codes for Classification with Rejection and Recovery

When deploying classifiers in the real world, users expect them to respo...

Please sign up or login with your details

Forgot password? Click here to reset