Your Out-of-Distribution Detection Method is Not Robust!

09/30/2022
by   Mohammad Azizmalayeri, et al.
0

Out-of-distribution (OOD) detection has recently gained substantial attention due to the importance of identifying out-of-domain samples in reliability and safety. Although OOD detection methods have advanced by a great deal, they are still susceptible to adversarial examples, which is a violation of their purpose. To mitigate this issue, several defenses have recently been proposed. Nevertheless, these efforts remained ineffective, as their evaluations are based on either small perturbation sizes, or weak attacks. In this work, we re-examine these defenses against an end-to-end PGD attack on in/out data with larger perturbation sizes, e.g. up to commonly used ϵ=8/255 for the CIFAR-10 dataset. Surprisingly, almost all of these defenses perform worse than a random detection under the adversarial setting. Next, we aim to provide a robust OOD detection method. In an ideal defense, the training should expose the model to almost all possible adversarial perturbations, which can be achieved through adversarial training. That is, such training perturbations should based on both in- and out-of-distribution samples. Therefore, unlike OOD detection in the standard setting, access to OOD, as well as in-distribution, samples sounds necessary in the adversarial training setup. These tips lead us to adopt generative OOD detection methods, such as OpenGAN, as a baseline. We subsequently propose the Adversarially Trained Discriminator (ATD), which utilizes a pre-trained robust model to extract robust features, and a generator model to create OOD samples. Using ATD with CIFAR-10 and CIFAR-100 as the in-distribution data, we could significantly outperform all previous methods in the robust AUROC while maintaining high standard AUROC and classification accuracy. The code repository is available at https://github.com/rohban-lab/ATD .

READ FULL TEXT
research
08/22/2021

Robustness-via-Synthesis: Robust Training with Generative Adversarial Perturbations

Upon the discovery of adversarial attacks, robust models have become obl...
research
03/02/2021

Evaluating the Robustness of Geometry-Aware Instance-Reweighted Adversarial Training

In this technical report, we evaluate the adversarial robustness of a ve...
research
01/25/2023

A Data-Centric Approach for Improving Adversarial Training Through the Lens of Out-of-Distribution Detection

Current machine learning models achieve super-human performance in many ...
research
05/31/2018

Scaling provable adversarial defenses

Recent work has developed methods for learning deep network classifiers ...
research
11/21/2020

A Neuro-Inspired Autoencoding Defense Against Adversarial Perturbations

Deep Neural Networks (DNNs) are vulnerable to adversarial attacks: caref...
research
01/18/2022

Adversarial vulnerability of powerful near out-of-distribution detection

There has been a significant progress in detecting out-of-distribution (...
research
05/27/2020

Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models

The vulnerability of deep networks to adversarial attacks is a central p...

Please sign up or login with your details

Forgot password? Click here to reset