Improving Adversarial Robustness by Data-Specific Discretization

05/20/2018
by   Jiefeng Chen, et al.
0

A recent line of research proposed (either implicitly or explicitly) gradient-masking preprocessing techniques to improve adversarial robustness. However, as shown by Athaley-Carlini-Wagner, essentially all these defenses can be circumvented if an attacker leverages approximate gradient information with respect to the preprocessing. This thus raises a natural question of whether there is a useful preprocessing technique in the context of white-box attacks, even just for only mildly complex datasets such as MNIST. In this paper we provide an affirmative answer to this question. Our key observation is that for several popular datasets, one can approximately encode entire dataset using a small set of separable codewords derived from the training set, while retaining high accuracy on natural images. The separability of the codewords in turn prevents small perturbations as in ℓ_∞ attacks from changing feature encoding, leading to adversarial robustness. For example, for MNIST our code consists of only two codewords, 0 and 1, and the encoding of any pixel is simply 1[x > 0.5] (i.e., whether a pixel x is at least 0.5). Applying this code to a naturally trained model already gives high adversarial robustness even under strong white-box attacks based on Backward Pass Differentiable Approximation (BPDA) method of Athaley-Carlini-Wagner that takes the codes into account. We give density-estimation based algorithms to construct such codes, and provide theoretical analysis and certificates of when our method can be effective. Systematic evaluation demonstrates that our method is effective in improving adversarial robustness on MNIST, CIFAR-10, and ImageNet, for either naturally or adversarially trained models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2018

Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses

Research on adversarial examples in computer vision tasks has shown that...
research
11/15/2022

MORA: Improving Ensemble Robustness Evaluation with Model-Reweighing Attack

Adversarial attacks can deceive neural networks by adding tiny perturbat...
research
12/09/2018

Feature Denoising for Improving Adversarial Robustness

Adversarial attacks to image classification systems present challenges t...
research
05/14/2022

Evaluating Membership Inference Through Adversarial Robustness

The usage of deep learning is being escalated in many applications. Due ...
research
12/11/2019

What it Thinks is Important is Important: Robustness Transfers through Input Gradients

Adversarial perturbations are imperceptible changes to input pixels that...
research
10/09/2020

How Does Mixup Help With Robustness and Generalization?

Mixup is a popular data augmentation technique based on taking convex co...
research
01/06/2021

Adversarial Robustness by Design through Analog Computing and Synthetic Gradients

We propose a new defense mechanism against adversarial attacks inspired ...

Please sign up or login with your details

Forgot password? Click here to reset