Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

07/24/2021
by   Florian Tramèr, et al.
0

Making classifiers robust to adversarial examples is hard. Thus, many defenses tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a general hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance ϵ (in some metric), we can build a similarly robust (but inefficient) classifier for attacks at distance ϵ/2. Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated. To illustrate, we revisit 13 detector defenses. For 11/13 cases, we show that the claimed detection results would imply an inefficient classifier with robustness far beyond the state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2022

Adversarial Detector with Robust Classifier

Deep neural network (DNN) models are wellknown to easily misclassify pre...
research
03/19/2020

Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates

To deflect adversarial attacks, a range of "certified" classifiers have ...
research
05/05/2019

Better the Devil you Know: An Analysis of Evasion Attacks using Out-of-Distribution Adversarial Examples

A large body of recent work has investigated the phenomenon of evasion a...
research
06/14/2021

Audio Attacks and Defenses against AED Systems – A Practical Study

Audio Event Detection (AED) Systems capture audio from the environment a...
research
10/24/2021

ADC: Adversarial attacks against object Detection that evade Context consistency checks

Deep Neural Networks (DNNs) have been shown to be vulnerable to adversar...
research
11/15/2018

Adversarial Examples from Cryptographic Pseudo-Random Generators

In our recent work (Bubeck, Price, Razenshteyn, arXiv:1805.10204) we arg...
research
07/03/2023

Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 hav...

Please sign up or login with your details

Forgot password? Click here to reset