Playing it Safe: Adversarial Robustness with an Abstain Option

11/25/2019
by   Cassidy Laidlaw, et al.
0

We explore adversarial robustness in the setting in which it is acceptable for a classifier to abstain—that is, output no class—on adversarial examples. Adversarial examples are small perturbations of normal inputs to a classifier that cause the classifier to give incorrect output; they present security and safety challenges for machine learning systems. In many safety-critical applications, it is less costly for a classifier to abstain on adversarial examples than to give incorrect output for them. We first introduce a novel objective function for adversarial robustness with an abstain option which characterizes an explicit tradeoff between robustness and accuracy. We then present a simple baseline in which an adversarially-trained classifier abstains on all inputs within a certain distance of the decision boundary, which we theoretically and experimentally evaluate. Finally, we propose Combined Abstention Robustness Learning (CARL), a method for jointly learning a classifier and the region of the input space on which it should abstain. We explore different variations of the PGD and DeepFool adversarial attacks on CARL in the abstain setting. Evaluating against these attacks, we demonstrate that training with CARL results in a more accurate, robust, and efficient classifier than the baseline.

READ FULL TEXT
research
10/24/2020

ATRO: Adversarial Training with a Rejection Option

This paper proposes a classification framework with a rejection option t...
research
09/12/2023

Using Reed-Muller Codes for Classification with Rejection and Recovery

When deploying classifiers in the real world, users expect them to respo...
research
11/02/2022

Defending with Errors: Approximate Computing for Robustness of Deep Neural Networks

Machine-learning architectures, such as Convolutional Neural Networks (C...
research
10/16/2018

Security Matters: A Survey on Adversarial Machine Learning

Adversarial machine learning is a fast growing research area, which cons...
research
11/23/2021

Adversarial machine learning for protecting against online manipulation

Adversarial examples are inputs to a machine learning system that result...
research
12/01/2016

A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples

Most machine learning classifiers, including deep neural networks, are v...
research
09/19/2019

Adversarial Vulnerability Bounds for Gaussian Process Classification

Machine learning (ML) classification is increasingly used in safety-crit...

Please sign up or login with your details

Forgot password? Click here to reset