Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression

05/08/2017 ∙ by Nilaksh Das, et al. ∙ 0

Deep neural networks (DNNs) have achieved great success in solving a variety of machine learning (ML) problems, especially in the domain of image recognition. However, recent research showed that DNNs can be highly vulnerable to adversarially generated instances, which look seemingly normal to human observers, but completely confuse DNNs. These adversarial samples are crafted by adding small perturbations to normal, benign images. Such perturbations, while imperceptible to the human eye, are picked up by DNNs and cause them to misclassify the manipulated instances with high confidence. In this work, we explore and demonstrate how systematic JPEG compression can work as an effective pre-processing step in the classification pipeline to counter adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient Sign Method, DeepFool). An important component of JPEG compression is its ability to remove high frequency signal components, inside square blocks of an image. Such an operation is equivalent to selective blurring of the image, helping remove additive perturbations. Further, we propose an ensemble-based technique that can be constructed quickly from a given well-performing DNN, and empirically show how such an ensemble that leverages JPEG compression can protect a model from multiple types of adversarial attacks, without requiring knowledge about the model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

Code Repositories

Jpeg_Defense

A simple jpeg defense for the OpenAI attack


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Abstract

Deep neural networks (DNNs) have achieved great success in solving a variety of machine learning (ML) problems, especially in the domain of image recognition. However, recent research showed that DNNs can be highly vulnerable to adversarially generated instances, which look seemingly normal to human observers, but completely confuse DNNs. These adversarial samples are crafted by adding small perturbations to normal, benign images. Such perturbations, while imperceptible to the human eye, are picked up by DNNs and cause them to misclassify the manipulated instances with high confidence. In this work, we explore and demonstrate how systematic JPEG compression can work as an effective pre-processing step in the classification pipeline to counter adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient Sign Method, DeepFool). An important component of JPEG compression is its ability to remove high frequency signal components, inside square blocks of an image. Such an operation is equivalent to selective blurring of the image, helping remove additive perturbations. Further, we propose an ensemble-based technique that can be constructed quickly from a given well-performing DNN, and empirically show how such an ensemble that leverages JPEG compression can protect a model from multiple types of adversarial attacks, without requiring knowledge about the model.

1 Introduction

Over the past few years, deep neural networks have achieved huge success in many important applications. Computer vision, in particular, enjoys some of the biggest improvement over traditional methods 

[11]

. As the DNN models become more powerful, people tend to do less data pre-processing or manual feature engineering, and prefer so-called end-to-end learning. For example, instead of manual feature normalization or standardization, one can add batch normalization layers and learn the best way to do it from the data distribution 

[9]. Image denoising can also be performed by stacking a DNN on top of an auto-encoder [6].

However, recent research has shown serious potential vulnerability in DNN models [24], which demonstrates that adding some small and human-imperceptible perturbations on an image can mislead the prediction of a DNN model to some arbitrary class. These perturbations can be computed by using the gradient information of a DNN model, which guides the direction in the input space that will most drastically change the model outputs [4, 21]. To make the vulnerability even more troubling, it is possible to compute a single “universal” perturbation that can be applied to any images and mislead the classification results of the model [17]. Also, one can perform black-box attacks without knowing the exact DNN model being used [19].

Many defense methods have been proposed to counteract the adversarial attacks. A common way is to design new network architectures or optimization techniques [6, 4]

. However, finding a good network architecture and hyperparameters for a particular dataset can be hard, and the resulting model may only be resistant to certain kind of attacks.

1.1 Our Contributions

In this work, we propose to use JPEG compression as a simple and effective pre-processing step to remove adversarial noise. Our intuition is that as adversarial noises are often indiscernible by the human eye, JPEG compression — designed to selectively discard information unnoticeable to humans — have strong potential in combating such manipulations.

Our approach has multiple desired advantages. First, JPEG is a widely-used encoding technique and many images are already stored in the JPEG format. Most operating systems also have built-in support to encode and decode JPEG images, so even non-expert users can easily apply this pre-processing step. Second, this approach does not require knowledge about the model nor the attack, and can be applied to a wide range of image datasets.

This work presents the following contributions:

  • A pre-processing step to neural network image classifiers that uses JPEG compression to remove adversarial noise from a given dataset.

  • Empirical tests on two datasets, CIFAR-10 and GTSRB, that systematically studies how varying JPEG compression qualities affects prediction accuracy.

  • Results showing the effect of including various amount of JPEG compressed images in the training process. We find that this significantly boosts accuracies on adversarial images and does not hurt the performance on benign images.

Figure 1: A comparison of the classification results of an exemplar image from the German Traffic Sign Recognition Benchmark (GTSRB) dataset. A benign image (left) is originally classified as a stop sign, but after the addition of an adversarial perturbation to the image (middle) the resulting image is classified as a max speed 100 sign. Using JPEG compression on the adversarial image (right), we recover the original classification of stop sign.

2 Background

In this section, we discuss existing adversarial attack algorithms and defense mechanisms. We then give a brief overview of JPEG compression, which plays a crucial role in our defense approach.

2.1 Adversarial Attacks

Consider the scenario where a trained machine learning classifier is deployed. An attacker, assumed to have full knowledge of the classifier , tries to compute a small distortion for some test example such that the perturbed example is misclassified by the model, i.e., . Prior work has shown that even if the machine learning model is unknown, one can train a substitute model and use it to compute the perturbation. This approach is very effective in practice when both the target model and the substitute model are deep neural networks, due to the property of transferability [24, 19].

The seminal work by Szegedy et al. [24] proposed the first effective adversarial attack on DNN image classifiers by solving a Box-constrained L-BFGS optimization problem and showed that the computed perturbations to the images were indistinguishable to the human eye: a rather troublesome property for people trying to identify adversarial images. This discovery has gained tremendous interest, and many new attack algorithms have been invented [4, 18, 17, 21] and applied to other domains such as malware detection [5, 7]

, sentiment analysis 

[22]

, and reinforcement learning 

[14, 8].

In this paper, we explore the Fast Gradient Sign Method (FGSM[4] and the DeepFool (DF[18] attack, each of which construct instance-specific perturbations to confuse a given model. We choose the FGSM attack for our evaluation since it is the most efficient approach in terms of computation time and the DF attack because it computes minimal (i.e., highly unnoticeable) perturbations.

Since we plan to evaluate these attacks in a practical setting, we only consider the parameters of these attacks that produce adversarial instances having low pathology, i.e., the images are not obviously perceivable as having been manipulated, as seen in the middle image in Figure 1. Below, we briefly review the two attacks.

Fast Gradient Sign Method [4]. FGSM is a fast algorithm which computes perturbations subject to an

constraint. The perturbation is computed by linearizing the loss function

,

where is the set of parameters of the model and is the true label of the instance. The parameter controls the magnitude of the perturbation. Intuitively, this method uses the gradient of the loss function to determine in which direction each pixel’s intensity should be changed to minimize the loss function, and updates all pixels accordingly by a specific magnitude. It is important to note here that FGSM was designed to be a computational fast attack rather than an optimal attack. Therefore, it is not meant to produce minimal adversarial perturbations.

DeepFool [18]. DF constructs an adversarial instance under an constraint by assuming the decision boundary to be hyperplanar. The authors leverage this simplification to compute a minimal adversarial perturbation that results in a sample that is close to the original instance but orthogonally cuts across the nearest decision boundary. In this respect, DF is an untargeted attack. Since the underlying assumption that the decision boundary is completely linear in higher dimensions is an oversimplification of the actual case, DF keeps reiterating until a true adversarial instance is found. The resulting perturbations are harder for humans to detect compared to perturbations applied by other techniques.

2.2 Defense Mechanisms

Although making a DNN model completely immune to adversarial attacks is still an open problem, there have been various attempts to mitigate the threat. We summarize the approaches with four categories.

  1. Detecting adversarial examples before performing classification.

    Metzen et al. [16] propose to distinguish genuine examples from the adversarially perturbed ones by augmenting deep neural networks with a small “detector” subnetwork. Feinman et al. [3]

    use density estimates to detect examples that lie far from the natural data manifold, and use Bayesian uncertainty estimates to detect when examples lie in the low-confidence regions.

  2. Modifying network architecture. Deep Contractive Network [6]

    is a generalization of the contractive autoencoder, which imposes a layer-wise contractive penalty in a feed-forward neural network. This approximately minimizes the network outputs variance with respect to perturbations in the inputs. Dense Associative Memory model 

    [12]

    tries to enforce higher order interactions between neurons by changing rectified linear unit (ReLU) to rectified polynomials. The idea is inspired by the hypothesis that adversarial examples are caused by high-dimensional linearity of DNN models.

  3. Modifying the training process. The most common and straightforward approach is to directly use adversarial examples to augment the training set. However, this is computationally expensive. Goodfellow et al. [4] simulate this process in a more efficient way by using a modified loss function that takes a perturbed example into account. Papernot et al. [20] use the distillation method that uses the soft outputs of the first model as labels to train a second model.

  4. Pre-processing input examples to remove adversarial perturbation. A major advantage of this approach is that it can be used with any machine learning model, therefore it can be used alongside any other method described above. Bhagoji et al. [1]

    apply principal component analysis on images to reduce dimension and discard noise. Luo et al. 

    [15] propose to use a foveation-based mechanism that applies a DNN model on a certain region of an image and discards information from other regions.

    Our work belongs to this category. Prior works most relevant to our proposed method are [2] and [13], both of which include JPEG compression as their defense mechanism. However, previous work did not focus on how JPEG compression may be systematically leveraged as a defense mechanism. For example, [2] only studied JPEG compression with quality 75 and had not evaluated how varying the amount of compression would affect performance. In this work, we conduct extensive study to understand the compression approach’s capability.

    [13] tried pre-processing techniques besides JPEG compression, such as printing out adversarial images and taking pictures of them using a cell phone, changing contrast, and adding Gaussian noise. However, they only processed the images during the testing phase. In contrast, we also consider training our models with JPEG compressed images. We further show that constructing an ensemble of the models obtained by training on images of different levels of compression quality can significantly boost the success rates in recovering the correct answers in adversarial images.

2.3 JPEG compression

JPEG is a standard and widely-used image encoding and compression technique consists of the following steps:

  1. converting the given image from RGB to Y color space: this is done because the human visual system relies more on spatial content and acuity than it does on color for interpretation. Converting the color space isolates these components which are of more import.

  2. performing spatial subsampling of the chrominance channels in the Y space: the human eye is much more sensitive to changes in luminance, and downsampling the chrominance information does not affect the human perception of the image very much.

  3. transforming a blocked representation of the Y spatial image data to a frequency domain representation using Discrete Cosine Transform (DCT): this step allows the JPEG algorithm to further compress the image data as outlined in the next steps by computing DCT coefficients.

  4. performing quantization of the blocked frequency domain data according to a user defined quality factor: this is where the JPEG algorithm achieves majority of the compression, at the expense of image quality. This step suppresses higher frequencies more since these coefficients contribute less to the human perception of the image.

3 Experimental setup

Experiments in this paper were conducted with convolutional neural networks on two image datasets: the

CIFAR-10 dataset [10], and the German Traffic Sign Recognition Benchmark (GTSRB) dataset [23].

The CIFAR-10 dataset consists of 50,000 training examples and a test set of 10,000 examples with 10 classes. Each image in the dataset is of size pixels. The GTSRB dataset has 43 classes with 39,209 training examples and 12,630 testing examples. The image sizes in this dataset vary between to pixels. For our analysis, we rescale each image to pixels.

For CIFAR-10, we use a convolutional neural network with 2 Conv-Conv-Pooling blocks, having Conv layer filter depths of 32 and 64 respectively. The Conv and Pooling filter size used are

with a Pooling stride of 2, 2. This is followed by a fully connected layer of 512 units that feeds into a softmax output layer of 10 classes. The same architecture is extended for the GTSRB dataset with an additional Conv-Conv-Pooling block of filter depth 128 and a softmax output layer of 43 classes. The Pooling filter size is made

.

Both model was trained for 400 epochs using categorical cross entropy loss with dropout regularization. We used the Adam optimizer to find the best weights. The final models obtained had testing accuracy of 82.88% and 97.83% on CIFAR-10 and GTSRB respectively.



To measure the effectiveness of an adversarial attack, we use a metric that we call the “misclassification success” rate. It is defined as the proportion of instances which were correctly classified by the trained models and whose labels were successfully flipped by the attack.

4 JPEG Compression as Defense

A core principle behind JPEG compression is based on the human psychovisual system, which aims to suppress high frequency information like sharp transitions in intensity and color hue using Discrete Cosine Transform. As adversarial attacks often introduce perturbations that are not compatible with human psychovisual awareness (hence these attacks are sometimes imperceptible to humans), and we believe JPEG compression has the potential to remove these artifacts. Thus, we propose to use JPEG compression as a pre-processing step before running an instance through the classification model. We demonstrate how using JPEG compression reduces the mistakes a model makes on datasets that have been adversarially manipulated.

Figure 2: Applying JPEG compression (dashed lines with symbols) can counter FGSM and DeepFool attacks on the CIFAR-10 and GTSRB datasets, e.g., slightly compressing CIFAR-10 images dramatically lowers DeepFool’s attack success rate, indicated by the steep orange line (left plot). means no compression has been applied. Attacks can be further suppressed by “vaccinating” a DNN model by training it with compressed images, and using an ensemble of such models – our approach, discussed in Section 5, rectifies a great majority of misclassification (indicated by the horizontal dashed lines).

4.1 Effect of JPEG Compression on Classification

Benign, everyday images lie in a very narrow manifold. An image with completely random pixel colors is highly unlikely to be perceived as natural by human beings. However, the objective basis of classification models, like DNNs, often are not aligned with such considerations. DNNs may be viewed as constructing decision boundaries that linearly separates the data in high dimensional spaces. In doing so, these models assume that the subspaces of natural images exist beyond the actual manifold. Adversarial attacks take advantage of this by perturbing images just enough so that they cross over the decision boundary of the model. However, this crossover does not guarantee that the perturbed images would lie in the original narrow manifold. Indeed, perturbed images could lie in artificially expanded subspaces where natural images would not be found.

Since JPEG compression takes the human psychovisual system into account, we pursue the hypothesis that the manifold in which JPEG images occur would have some semblance with the manifold of naturally occurring images, and that using JPEG compression as a pre-processing step during classification would re-project any adversarially perturbed instances back onto this manifold.

To test our hypothesis, we applied JPEG compression to images from the CIFAR-10 and GTSRB datasets, adversarially perturbed by FGSM and DF, and varied the quality parameter of the JPEG algorithm. Figure 2 shows the experiment results.

Overall, we observe that applying JPEG compression (dashed lines with symbols) can counter FGSM and DeepFool attacks on the CIFAR-10 and GTSRB datasets. means no compression has been applied.

Increasing compression (decreasing image quality) generally leads to better removal of the adversarial effect at first, but the benefit reaches an inflection point where the success rate starts increasing again. Besides the adversarial perturbation, this inflection may also be attributed to the artifacts introduced by JPEG compression itself at lower image qualities, which confuses the model.

With CIFAR-10, we observe that slightly compressing its images dramatically lowers DeepFool’s attack success rate, indicated by the steep orange line (left plot). The steepest drops take place on applying JPEG compression of image quality 100 on uncompressed images, introducing extremely little compression in the frequency domain. Since the JPEG algorithm also performs downsampling of the chrominance channel irrespective of the image quality, a hypothesis that supports this observation may be that DF attacks the chrominance channel much more than the luminance channel in color space. Since DF introduces a minimal perturbation, it is easily removed with JPEG compression.

4.2 Vaccinating Models by Training with JPEG Compressed Images

Figure 3: Classification accuracies of each vaccinated model on the CIFAR-10 test set that has been compressed to a particular image quality. Each cluster of bars represents the model performances when tested with images having the corresponding image quality as indicated on the vertical axis. Within each cluster, each bar represents a vaccinated model. Vertical red lines denote the accuracy of the original, non-vaccinated model for that image quality.

Testing adversarial images with JPEG compression suggests that the algorithm seems to be able to remove perturbations by re-projecting the images to the manifold of JPEG images. Since our initial model was trained on the original benign image dataset (without any adversarial manipulation), testing with compressed images that have lower image quality unsurprisingly lead to higher misclassification rates, likely due to artifacts introduced by the compression algorithm itself. This can also be explained by the notion that the manifold of JPEG compressed images of a particular image quality may be similar to that of another quality, but not completely aligned. We now propose that with training the model over this manifold corresponding to a particular image quality, the model can potentially learn to classify images even in the presence of JPEG artifacts. From the perspective of adversarial images, applying JPEG compression would remove the perturbations and re-training with compressed images could help ensure that the model is not confused by the JPEG artifacts. We call this approach of re-training the model with JPEG compressed images as “vaccinating” the model against adversarial attacks.

Figure 4: Performance of the vaccinated models on adversarially constructed test sets. Each line with a symbol represents a vaccinated model and the black horizontal dotted line represents the accuracy of the original model under attack. These results demonstrate that re-training with JPEG compressed images can help recover from an adversarial attack.

We re-trained the model with images of JPEG qualities 100 through 20 (increasing compression) with a step size of 10, and hence obtained 9 models (besides the original model). We refer to each of these re-trained models as , where corresponds to the image quality the model was re-trained with. The original model is referred to as .

While re-training, the weights of were initialized with the weights of for faster convergence. For example, the weights of were initialized with weights of , and the weights of were initialized with weights of , and so on. The intuition for our approach was derived from the proposition that the manifold of images corresponding to successive levels of compression would exist co-locally, and the decision boundaries learned by the model would not have to displace significantly to account for the new manifold. This means that given any model, our approach can quickly generate new vaccinated models.

Figure 3 shows clear benefits of our vaccination idea — vaccinated models generally perform better than the original model on the CIFAR-10 test set, especially at lower image qualities. For example, performs the best for images with quality 20 and worst for images with quality 100. Correspondingly, performs the best for images with quality 100 and worst for images with quality 20. The performance of closely follows the performance of across each image quality. All these observations are consistent with our fundamental intuition of JPEG manifolds coexisting in the same hyperlocality.

Figure 4 visualizes the performance of the vaccinated models on adversarially perturbed datasets by varying the image quality it is tested on. Again, general trends show that increasing JPEG compression removes adversarial perturbations. We see that the effect of the adversarial attacks on does get transferred to the vaccinated models as well, but as the compression is increased on the images that the model is trained with, the transferability of the attack subsides. An interesting thing to note here is that with CIFAR-10, the accuracy decreases for lower image qualities. This means that the artifacts introduced by JPEG may be taking over the adversarial attack to bring down the accuracy, which may be attributed to the small image size of the CIFAR-10 dataset. We do not see such a significant decrease in accuracy at lower image qualities with the GTSRB dataset, which contains larger images.

5 Fortified Defense: an Ensemble of Models

Figure 5: Accuracies of all models under consideration when each model is individually attacked. The attack does get transferred to other models but is mitigated with increasing JPEG compression.
Original scenario With our ensemble
CIFAR-10 Benign images 82.88% 83.19%
FGSM [=0.02] 28.97% 79.57%
DeepFool 27.44% 82.71%
GTSRB Benign images 97.83% 98.59%
FGSM [=0.08] 41.00% 73.37%
DeepFool 68.19% 91.70%
Table 1: Performance of our approach on the respective test sets as compared to the scenario when original, non-vaccinated model is under attack.

If adversaries are able to gain access to the original model, they may also be able to recreate the vaccinated models and attack them individually. To protect the classification pipeline against such an attack, we propose to use an ensemble of vaccinated models that vote on images with varying image qualities. Hence, in our ensemble, the models through vote on a given image compressed at image qualities of 100 through 20 with a step size of 10. This would yield 81 votes. The final label assigned to the sample is simply the label that got the majority votes through this process.

Since each of the vaccinated models is trained on a different manifold of images, the ensemble essentially models separate subspaces of the data, and the current attacks can only distort the samples in one of these subspaces. Hence, no matter which model an adversary targets, the other models should make up for the attack. Figure 5 illustrates this idea. A majority of the models are not affected significantly irrespective of the model being attacked. Increasing JPEG compression also protects the model being attacked to some extent. Even if the perturbation introduced is very strong, training on different compression levels help ensure that the decision boundaries learned by the vaccinated models would be dissimilar, and the verdict of the models would be highly uncorrelated.

We present empirical results of the accuracies obtained with the original model in Table 1 for comparison with our ensemble approach, where was targeted with adversarial attacks and our approach was able to recover from the attack by employing JPEG compression. Since the ensemble involves referring to several models with varying compression levels applied to the instances being tested, a parallelized approach can also be undertaken to make the process faster.

Note that we choose an arbitrary combination of image qualities for our analysis, and more optimal combinations may exist. If an adversary gains access to a classifier model and is also aware of our scheme of protecting it using this ensemble approach with vaccinated models, one can simply modify the scheme and opt for a different combination of image qualities, which would yield a completely different ensemble. Since our approach is built for faster convergence, the ensemble can be constructed quickly while still retaining the network architecture that works well for a given problem.

6 Conclusions

We have presented our preliminary empirical analysis of how systematic use of JPEG compression, especially in ensembles, can counter adversarial attacks and dramatically reduce their effects. In our ongoing work, we are evaluating our approaches against more attack strategies and datasets.

References