Getting a-Round Guarantees: Floating-Point Attacks on Certified Robustness

05/20/2022
by   Jiankai Jin, et al.
0

Adversarial examples pose a security risk as they can alter a classifier's decision through slight perturbations to a benign input. Certified robustness has been proposed as a mitigation strategy where given an input x, a classifier returns a prediction and a radius with a provable guarantee that any perturbation to x within this radius (e.g., under the L_2 norm) will not alter the classifier's prediction. In this work, we show that these guarantees can be invalidated due to limitations of floating-point representation that cause rounding errors. We design a rounding search method that can efficiently exploit this vulnerability to find adversarial examples within the certified radius. We show that the attack can be carried out against several linear classifiers that have exact certifiable guarantees and against neural network verifiers that return a certified lower bound on a robust radius. Our experiments demonstrate over 50 classifiers, up to 35 a 9 certified radius was verified by a prominent bound propagation method. We also show that state-of-the-art random smoothed classifiers for neural networks are also susceptible to adversarial examples (e.g., up to 2 CIFAR10)-validating the importance of accounting for the error rate of robustness guarantees of such classifiers in practice. Finally, as a mitigation, we advocate the use of rounded interval arithmetic to account for rounding errors.

READ FULL TEXT

page 13

page 15

research
03/06/2020

Exploiting Verified Neural Networks via Floating Point Numerical Error

We show how to construct adversarial examples for neural networks with e...
research
06/12/2019

A Stratified Approach to Robustness for Randomly Smoothed Classifiers

Strong theoretical guarantees of robustness can be given for ensembles o...
research
10/12/2022

Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity

In response to subtle adversarial examples flipping classifications of n...
research
08/21/2021

Integer-arithmetic-only Certified Robustness for Quantized Neural Networks

Adversarial data examples have drawn significant attention from the mach...
research
01/11/2022

Quantifying Robustness to Adversarial Word Substitutions

Deep-learning-based NLP models are found to be vulnerable to word substi...
research
06/30/2020

Neural Network Virtual Sensors for Fuel Injection Quantities with Provable Performance Specifications

Recent work has shown that it is possible to learn neural networks with ...
research
11/19/2020

Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams

Adversarial examples are a widely studied phenomenon in machine learning...

Please sign up or login with your details

Forgot password? Click here to reset