A simple jpeg defense for the OpenAI attack
Deep neural networks (DNNs) have achieved great success in solving a variety of machine learning (ML) problems, especially in the domain of image recognition. However, recent research showed that DNNs can be highly vulnerable to adversarially generated instances, which look seemingly normal to human observers, but completely confuse DNNs. These adversarial samples are crafted by adding small perturbations to normal, benign images. Such perturbations, while imperceptible to the human eye, are picked up by DNNs and cause them to misclassify the manipulated instances with high confidence. In this work, we explore and demonstrate how systematic JPEG compression can work as an effective pre-processing step in the classification pipeline to counter adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient Sign Method, DeepFool). An important component of JPEG compression is its ability to remove high frequency signal components, inside square blocks of an image. Such an operation is equivalent to selective blurring of the image, helping remove additive perturbations. Further, we propose an ensemble-based technique that can be constructed quickly from a given well-performing DNN, and empirically show how such an ensemble that leverages JPEG compression can protect a model from multiple types of adversarial attacks, without requiring knowledge about the model.READ FULL TEXT VIEW PDF
Recent studies have demonstrated that machine learning approaches like d...
Adversarial examples are known to have a negative effect on the performa...
Recent studies have shown that slight perturbations in the input data ca...
Despite the great successes achieved by deep neural networks (DNNs), rec...
The rapidly growing body of research in adversarial machine learning has...
As deep neural networks (DNNs) become widely used, pruned and quantised
Deep neural networks (DNNs) can outperform human brains in specific task...
A simple jpeg defense for the OpenAI attack
Deep neural networks (DNNs) have achieved great success in solving a variety of machine learning (ML) problems, especially in the domain of image recognition. However, recent research showed that DNNs can be highly vulnerable to adversarially generated instances, which look seemingly normal to human observers, but completely confuse DNNs. These adversarial samples are crafted by adding small perturbations to normal, benign images. Such perturbations, while imperceptible to the human eye, are picked up by DNNs and cause them to misclassify the manipulated instances with high confidence. In this work, we explore and demonstrate how systematic JPEG compression can work as an effective pre-processing step in the classification pipeline to counter adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient Sign Method, DeepFool). An important component of JPEG compression is its ability to remove high frequency signal components, inside square blocks of an image. Such an operation is equivalent to selective blurring of the image, helping remove additive perturbations. Further, we propose an ensemble-based technique that can be constructed quickly from a given well-performing DNN, and empirically show how such an ensemble that leverages JPEG compression can protect a model from multiple types of adversarial attacks, without requiring knowledge about the model.
Over the past few years, deep neural networks have achieved huge success in many important applications. Computer vision, in particular, enjoys some of the biggest improvement over traditional methods
. As the DNN models become more powerful, people tend to do less data pre-processing or manual feature engineering, and prefer so-called end-to-end learning. For example, instead of manual feature normalization or standardization, one can add batch normalization layers and learn the best way to do it from the data distribution. Image denoising can also be performed by stacking a DNN on top of an auto-encoder .
. However, finding a good network architecture and hyperparameters for a particular dataset can be hard, and the resulting model may only be resistant to certain kind of attacks.
In this work, we propose to use JPEG compression as a simple and effective pre-processing step to remove adversarial noise.
Our intuition is that as adversarial noises are often indiscernible by the human eye, JPEG compression — designed to selectively discard information unnoticeable to humans — have strong potential in combating such manipulations.
Our approach has multiple desired advantages. First, JPEG is a widely-used encoding technique and many images are already stored in the JPEG format. Most operating systems also have built-in support to encode and decode JPEG images, so even non-expert users can easily apply this pre-processing step. Second, this approach does not require knowledge about the model nor the attack, and can be applied to a wide range of image datasets.
This work presents the following contributions:
A pre-processing step to neural network image classifiers that uses JPEG compression to remove adversarial noise from a given dataset.
Empirical tests on two datasets, CIFAR-10 and GTSRB, that systematically studies how varying JPEG compression qualities affects prediction accuracy.
Results showing the effect of including various amount of JPEG compressed images in the training process. We find that this significantly boosts accuracies on adversarial images and does not hurt the performance on benign images.
In this section, we discuss existing adversarial attack algorithms and defense mechanisms. We then give a brief overview of JPEG compression, which plays a crucial role in our defense approach.
Consider the scenario where a trained machine learning classifier is deployed.
An attacker, assumed to have full knowledge of the classifier ,
tries to compute a small distortion for some test example such that
the perturbed example is misclassified by the model, i.e., .
Prior work has shown that even if the machine learning model is unknown, one can train a substitute model and use it to compute the perturbation.
This approach is very effective in practice when both the target model and the substitute model are deep neural networks, due to the property of transferability [24, 19].
The seminal work by Szegedy et al.  proposed the first effective adversarial attack on DNN image classifiers by solving a Box-constrained L-BFGS optimization problem and showed that the computed perturbations to the images were indistinguishable to the human eye: a rather troublesome property for people trying to identify adversarial images. This discovery has gained tremendous interest, and many new attack algorithms have been invented [4, 18, 17, 21] and applied to other domains such as malware detection [5, 7]
, and reinforcement learning[14, 8].
constraint. The perturbation is computed by linearizing the loss function,
where is the set of parameters of the model and is the true label of the instance. The parameter controls the magnitude of the perturbation.
Intuitively, this method uses the gradient of the loss function to determine in which direction each pixel’s intensity should be changed to minimize the loss function, and updates all pixels accordingly by a specific magnitude.
It is important to note here that FGSM was designed to be a computational fast attack rather than an optimal attack. Therefore, it is not meant to produce minimal adversarial perturbations.
DeepFool . DF constructs an adversarial instance under an constraint by assuming the decision boundary to be hyperplanar. The authors leverage this simplification to compute a minimal adversarial perturbation that results in a sample that is close to the original instance but orthogonally cuts across the nearest decision boundary. In this respect, DF is an untargeted attack. Since the underlying assumption that the decision boundary is completely linear in higher dimensions is an oversimplification of the actual case, DF keeps reiterating until a true adversarial instance is found. The resulting perturbations are harder for humans to detect compared to perturbations applied by other techniques.
Although making a DNN model completely immune to adversarial attacks is still an open problem, there have been various attempts to mitigate the threat. We summarize the approaches with four categories.
Detecting adversarial examples before performing classification.
use density estimates to detect examples that lie far from the natural data manifold, and use Bayesian uncertainty estimates to detect when examples lie in the low-confidence regions.
Modifying network architecture. Deep Contractive Network 
is a generalization of the contractive autoencoder, which imposes a layer-wise contractive penalty in a feed-forward neural network. This approximately minimizes the network outputs variance with respect to perturbations in the inputs. Dense Associative Memory model
tries to enforce higher order interactions between neurons by changing rectified linear unit (ReLU) to rectified polynomials. The idea is inspired by the hypothesis that adversarial examples are caused by high-dimensional linearity of DNN models.
Modifying the training process. The most common and straightforward approach is to directly use adversarial examples to augment the training set. However, this is computationally expensive. Goodfellow et al.  simulate this process in a more efficient way by using a modified loss function that takes a perturbed example into account. Papernot et al.  use the distillation method that uses the soft outputs of the first model as labels to train a second model.
Pre-processing input examples to remove adversarial perturbation. A major advantage of this approach is that it can be used with any machine learning model, therefore it can be used alongside any other method described above. Bhagoji et al. 
apply principal component analysis on images to reduce dimension and discard noise. Luo et al. propose to use a foveation-based mechanism that applies a DNN model on a certain region of an image and discards information from other regions.
JPEG is a standard and widely-used image encoding and compression technique consists of the following steps:
converting the given image from RGB to Y color space: this is done because the human visual system relies more on spatial content and acuity than it does on color for interpretation. Converting the color space isolates these components which are of more import.
performing spatial subsampling of the chrominance channels in the Y space: the human eye is much more sensitive to changes in luminance, and downsampling the chrominance information does not affect the human perception of the image very much.
transforming a blocked representation of the Y spatial image data to a frequency domain representation using Discrete Cosine Transform (DCT): this step allows the JPEG algorithm to further compress the image data as outlined in the next steps by computing DCT coefficients.
performing quantization of the blocked frequency domain data according to a user defined quality factor: this is where the JPEG algorithm achieves majority of the compression, at the expense of image quality. This step suppresses higher frequencies more since these coefficients contribute less to the human perception of the image.
Experiments in this paper were conducted with convolutional neural networks on two image datasets: theCIFAR-10 dataset , and the German Traffic Sign Recognition Benchmark (GTSRB) dataset .
with a Pooling stride of 2, 2. This is followed by a fully connected layer of 512 units that feeds into a softmax output layer of 10 classes. The same architecture is extended for the GTSRB dataset with an additional Conv-Conv-Pooling block of filter depth 128 and a softmax output layer of 43 classes. The Pooling filter size is made.
Both model was trained for 400 epochs using categorical cross entropy loss with dropout regularization. We used the Adam optimizer to find the best weights. The final models obtained had testing accuracy of 82.88% and 97.83% on CIFAR-10 and GTSRB respectively.
A core principle behind JPEG compression is based on the human psychovisual system, which aims to suppress high frequency information like sharp transitions in intensity and color hue using Discrete Cosine Transform. As adversarial attacks often introduce perturbations that are not compatible with human psychovisual awareness (hence these attacks are sometimes imperceptible to humans), and we believe JPEG compression has the potential to remove these artifacts. Thus, we propose to use JPEG compression as a pre-processing step before running an instance through the classification model. We demonstrate how using JPEG compression reduces the mistakes a model makes on datasets that have been adversarially manipulated.
Benign, everyday images lie in a very narrow manifold.
An image with completely random pixel colors is highly unlikely to be perceived as natural by human beings.
However, the objective basis of classification models, like DNNs, often are not aligned with such considerations.
DNNs may be viewed as constructing decision boundaries that linearly separates the data in high dimensional spaces.
In doing so, these models assume that the subspaces of natural images exist beyond the actual manifold.
Adversarial attacks take advantage of this by perturbing images just enough so that they cross over the decision boundary of the model.
However, this crossover does not guarantee that the perturbed images would lie in the original narrow manifold.
Indeed, perturbed images could lie in artificially expanded subspaces where natural images would not be found.
Since JPEG compression takes the human psychovisual system into account, we pursue the hypothesis that the manifold in which JPEG images occur would have some semblance with the manifold of naturally occurring images, and that using JPEG compression as a pre-processing step during classification would re-project any adversarially perturbed instances back onto this manifold.
To test our hypothesis, we applied JPEG compression to images from the CIFAR-10 and GTSRB datasets, adversarially perturbed by FGSM and DF, and varied the quality parameter of the JPEG algorithm. Figure 2 shows the experiment results.
Overall, we observe that applying JPEG compression (dashed lines with symbols) can counter FGSM and DeepFool attacks on the CIFAR-10 and GTSRB datasets. means no compression has been applied.
Increasing compression (decreasing image quality) generally leads to better removal of the adversarial effect at first, but the benefit reaches an inflection point where the success rate starts increasing again. Besides the adversarial perturbation, this inflection may also be attributed to the artifacts introduced by JPEG compression itself at lower image qualities, which confuses the model.
With CIFAR-10, we observe that slightly compressing its images dramatically lowers DeepFool’s attack success rate, indicated by the steep orange line (left plot). The steepest drops take place on applying JPEG compression of image quality 100 on uncompressed images, introducing extremely little compression in the frequency domain. Since the JPEG algorithm also performs downsampling of the chrominance channel irrespective of the image quality, a hypothesis that supports this observation may be that DF attacks the chrominance channel much more than the luminance channel in color space. Since DF introduces a minimal perturbation, it is easily removed with JPEG compression.
Testing adversarial images with JPEG compression suggests that the algorithm seems to be able to remove perturbations by re-projecting the images to the manifold of JPEG images.
Since our initial model was trained on the original benign image dataset (without any adversarial manipulation), testing with compressed images that have lower image quality unsurprisingly lead to higher misclassification rates, likely due to artifacts introduced by the compression algorithm itself.
This can also be explained by the notion that the manifold of JPEG compressed images of a particular image quality may be similar to that of another quality, but not completely aligned.
We now propose that with training the model over this manifold corresponding to a particular image quality, the model can potentially learn to classify images even in the presence of JPEG artifacts.
From the perspective of adversarial images, applying JPEG compression would remove the perturbations and re-training with compressed images could help ensure that the model is not confused by the JPEG artifacts.
We call this approach of re-training the model with JPEG compressed images as “vaccinating” the model against adversarial attacks.
We re-trained the model with images of JPEG qualities 100 through 20 (increasing compression) with a step size of 10, and hence obtained 9 models (besides the original model). We refer to each of these re-trained models as , where corresponds to the image quality the model was re-trained with. The original model is referred to as .
While re-training, the weights of were initialized with the weights of for faster convergence. For example, the weights of were initialized with weights of , and the weights of were initialized with weights of , and so on. The intuition for our approach was derived from the proposition that the manifold of images corresponding to successive levels of compression would exist co-locally, and the decision boundaries learned by the model would not have to displace significantly to account for the new manifold. This means that given any model, our approach can quickly generate new vaccinated models.
Figure 3 shows clear benefits of our vaccination idea — vaccinated models generally perform better than the original model on the CIFAR-10 test set, especially at lower image qualities. For example, performs the best for images with quality 20 and worst for images with quality 100. Correspondingly, performs the best for images with quality 100 and worst for images with quality 20. The performance of closely follows the performance of across each image quality. All these observations are consistent with our fundamental intuition of JPEG manifolds coexisting in the same hyperlocality.
Figure 4 visualizes the performance of the vaccinated models on adversarially perturbed datasets by varying the image quality it is tested on. Again, general trends show that increasing JPEG compression removes adversarial perturbations. We see that the effect of the adversarial attacks on does get transferred to the vaccinated models as well, but as the compression is increased on the images that the model is trained with, the transferability of the attack subsides. An interesting thing to note here is that with CIFAR-10, the accuracy decreases for lower image qualities. This means that the artifacts introduced by JPEG may be taking over the adversarial attack to bring down the accuracy, which may be attributed to the small image size of the CIFAR-10 dataset. We do not see such a significant decrease in accuracy at lower image qualities with the GTSRB dataset, which contains larger images.
|Original scenario||With our ensemble|
If adversaries are able to gain access to the original model, they may also be able to recreate the vaccinated models and attack them individually.
To protect the classification pipeline against such an attack, we propose to use an ensemble of vaccinated models that vote on images with varying image qualities.
Hence, in our ensemble, the models through vote on a given image compressed at image qualities of 100 through 20 with a step size of 10.
This would yield 81 votes. The final label assigned to the sample is simply the label that got the majority votes through this process.
Since each of the vaccinated models is trained on a different manifold of images, the ensemble essentially models separate subspaces of the data, and the current attacks can only distort the samples in one of these subspaces. Hence, no matter which model an adversary targets, the other models should make up for the attack. Figure 5 illustrates this idea. A majority of the models are not affected significantly irrespective of the model being attacked. Increasing JPEG compression also protects the model being attacked to some extent. Even if the perturbation introduced is very strong, training on different compression levels help ensure that the decision boundaries learned by the vaccinated models would be dissimilar, and the verdict of the models would be highly uncorrelated.
We present empirical results of the accuracies obtained with the original model in Table 1 for comparison with our ensemble approach, where was targeted with adversarial attacks and our approach was able to recover from the attack by employing JPEG compression. Since the ensemble involves referring to several models with varying compression levels applied to the instances being tested, a parallelized approach can also be undertaken to make the process faster.
Note that we choose an arbitrary combination of image qualities for our analysis, and more optimal combinations may exist. If an adversary gains access to a classifier model and is also aware of our scheme of protecting it using this ensemble approach with vaccinated models, one can simply modify the scheme and opt for a different combination of image qualities, which would yield a completely different ensemble. Since our approach is built for faster convergence, the ensemble can be constructed quickly while still retaining the network architecture that works well for a given problem.
We have presented our preliminary empirical analysis of how systematic use of JPEG compression, especially in ensembles, can counter adversarial attacks and dramatically reduce their effects. In our ongoing work, we are evaluating our approaches against more attack strategies and datasets.
Crafting adversarial input sequences for recurrent neural networks.In 2016 IEEE Military Communications Conference, MILCOM, pages 49–54, 2016.