The Manifold Assumption and Defenses Against Adversarial Perturbations

11/21/2017
by   Xi Wu, et al.
0

In the adversarial-perturbation problem of neural networks, an adversary starts with a neural network model F and a point x that F classifies correctly, and applies a small perturbation to x to produce another point x' that F classifies incorrectly. In this paper, we propose taking into account the inherent confidence information produced by models when studying adversarial perturbations, where a natural measure of "confidence" is F( x)_∞ (i.e. how confident F is about its prediction?). Motivated by a thought experiment based on the manifold assumption, we propose a "goodness property" of models which states that confident regions of a good model should be well separated. We give formalizations of this property and examine existing robust training objectives in view of them. Interestingly, we find that a recent objective by Madry et al. encourages training a model that satisfies well our formal version of the goodness property, but has a weak control of points that are wrong but with low confidence. However, if Madry et al.'s model is indeed a good solution to their objective, then good and bad points are now distinguishable and we can try to embed uncertain points back to the closest confident region to get (hopefully) correct predictions. We thus propose embedding objectives and algorithms, and perform an empirical study using this method. Our experimental results are encouraging: Madry et al.'s model wrapped with our embedding procedure achieves almost perfect success rate in defending against attacks that the base model fails on, while retaining good generalization behavior.

READ FULL TEXT

page 11

page 12

11/21/2017

Manifold Assumption and Defenses Against Adversarial Perturbations

In the adversarial perturbation problem of neural networks, an adversary...
12/09/2017

NAG: Network for Adversary Generation

Adversarial perturbations can pose a serious threat for deploying machin...
01/02/2018

High Dimensional Spaces, Deep Learning and Adversarial Examples

In this paper, we analyze deep learning from a mathematical point of vie...
11/09/2018

Universal Decision-Based Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses

We study the problem of finding a universal (image-agnostic) perturbatio...
11/09/2018

Universal Hard-label Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses

We study the problem of finding a universal (image-agnostic) perturbatio...
04/05/2021

Rethinking Perturbations in Encoder-Decoders for Fast Training

We often use perturbations to regularize neural models. For neural encod...
03/23/2018

Detecting Adversarial Perturbations with Saliency

In this paper we propose a novel method for detecting adversarial exampl...