DeepAI AI Chat
Log In Sign Up

Explaining and Harnessing Adversarial Examples

by   Ian J. Goodfellow, et al.

Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.


page 3

page 4

page 6

page 8

page 11


Are Accuracy and Robustness Correlated?

Machine learning models are vulnerable to adversarial examples formed by...

A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples

Deep neural networks have been shown to suffer from a surprising weaknes...

A Unified Gradient Regularization Family for Adversarial Examples

Adversarial examples are augmented data points generated by imperceptibl...

Predicting Adversarial Examples with High Confidence

It has been suggested that adversarial examples cause deep learning mode...

Towards Explaining Adversarial Examples Phenomenon in Artificial Neural Networks

In this paper, we study the adversarial examples existence and adversari...

Analyzing Dynamic Adversarial Training Data in the Limit

To create models that are robust across a wide range of test inputs, tra...

Verifying the Causes of Adversarial Examples

The robustness of neural networks is challenged by adversarial examples ...

Code Repositories


Convolutional Neural Networks for Matlab for classification and segmentation, including Invariang Backpropagation (IBP) and Adversarial Training (AT) algorithms. Trained on GPU, require cuDNN v5.

view repo


Pytorch implementation of convolutional neural network adversarial attack techniques

view repo


Python codes for two popular methods for generating adversarial examples: LBFGS and fast gradient sign methods

view repo


LFH method to robust deep neural network models for malware classification

view repo


A MNIST classifer + adversarial image generator using simple genetic algorithm and tensorflow

view repo