The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

02/13/2019
by   Kevin Roth, et al.
18

We investigate conditions under which test statistics exist that can reliably detect examples, which have been adversarially manipulated in a white-box attack. These statistics can be easily computed and calibrated by randomly corrupting inputs. They exploit certain anomalies that adversarial attacks introduce, in particular if they follow the paradigm of choosing perturbations optimally under p-norm constraints. Access to the log-odds is the only requirement to defend models. We justify our approach empirically, but also provide conditions under which detectability via the suggested test statistics is guaranteed to be effective. In our experiments, we show that it is even possible to correct test time predictions for adversarial attacks with high accuracy.

READ FULL TEXT
research
05/23/2022

Learning to Ignore Adversarial Attacks

Despite the strong performance of current NLP models, they can be brittl...
research
11/07/2019

White-Box Target Attack for EEG-Based BCI Regression Problems

Machine learning has achieved great success in many applications, includ...
research
03/22/2023

Test-time Defense against Adversarial Attacks: Detection and Reconstruction of Adversarial Examples via Masked Autoencoder

Existing defense methods against adversarial attacks can be categorized ...
research
05/13/2020

Adversarial examples are useful too!

Deep learning has come a long way and has enjoyed an unprecedented succe...
research
09/30/2013

On statistics, computation and scalability

How should statistical procedures be designed so as to be scalable compu...
research
05/19/2020

Adversarial Attacks for Embodied Agents

Adversarial attacks are valuable for providing insights into the blind-s...
research
02/09/2023

Exploiting Certified Defences to Attack Randomised Smoothing

In guaranteeing that no adversarial examples exist within a bounded regi...

Please sign up or login with your details

Forgot password? Click here to reset