Encryption Inspired Adversarial Defense for Visual Classification

05/16/2020 ∙ by MaungMaung AprilPyone, et al. ∙ 0

Conventional adversarial defenses reduce classification accuracy whether or not a model is under attacks. Moreover, most of image processing based defenses are defeated due to the problem of obfuscated gradients. In this paper, we propose a new adversarial defense which is a defensive transform for both training and test images inspired by perceptual image encryption methods. The proposed method utilizes a block-wise pixel shuffling method with a secret key. The experiments are carried out on both adaptive and non-adaptive maximum-norm bounded white-box attacks while considering obfuscated gradients. The results show that the proposed defense achieves high accuracy (91.55 and (89.66 dataset. Thus, the proposed defense outperforms state-of-the-art adversarial defenses including latent adversarial training, adversarial training and thermometer encoding.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Security in computer vision systems is quintessential and high in demand. This is because computer vision technology has been deployed in many applications including safety and security critical applications such as self-driving cars, healthcare, facial recognition, etc. and many more visual recognition systems. Computer vision systems are primarily powered by deep neural networks (DNNs). It is proven that DNNs have brought impressive state-of-the-art results to computer vision. However, researchers have already discovered that neural networks in general are vulnerable towards certain alteration in the input known as adversarial examples 

[26, 4]

. These adversarial examples can cause neural networks misclassify or force to classify a targeted class with high confidence. Incorrect decisions made by DNNs can cause serious and dangerous problems. As an example, self-driving cars may misclassify “Stop” sign as “Speed Limit” 

[12]. Due to this threat, adversarial machine learning research has got a significant amount of attention recently although it has been started over a decade ago [5].

Researchers have proposed various attacks and defenses. Ideally, provable robust models are desired. Inspiring works such as [21, 11, 29] proposed provable secure training. Although these methods are attractive and desirable, they are not available for larger datasets. One recent work [30] scaled up to CIFAR-10 [16] dataset in provable defense research. However, the accuracy is not comparable even on low adversarial noise distance. There is also an alternative approach to find a defensive transform so that the prediction of a classifier on clean image is equal to that of an adversarial example (i.e., ). Such works include [6, 22, 14, 31, 25], etc. They all have been defeated when accounting for obfuscated gradients (a way of gradient masking) [3]. To reinforce these weak defense methods, Raff et al. [20] proposed a stronger defense by combining a large number of transforms stochastically. However, applying many transforms drop in accuracy even though the model is not under attack and is computationally expensive. Our previous work removes adversarial noise generated on one-bit images by double quantization [1], but, clean images are limited to be in one-bit.

Therefore, in this work, we propose a new adversarial defense which has been inspired by perceptual image encryption methods [9, 24, 27, 23]. It was reported that [27] can be used as a defensive transform [2]. However, it is not meant for adversarial defense and reduces accuracy. To defend adversarial examples and maintain high accuracy, we design a defensive transform that uses a block-wise pixel shuffling method. Similar to our work, Taran et al. proposed a key-based adversarial defense [28]. The main intellectual differences include: (1) the proposed defense is inspired by perceptual image encryption (specifically, block-wise image encryption), in contrast to traditional cryptographic methods and (2) we consider white-box attacks unlike the work by [28] that considered gray-box attacks. In an experiment, the proposed defense is confirmed to outperform state-of-the-art adversarial defenses including latent adversarial training, adversarial training and thermometer encoding under maximum-norm bounded threat model with the noise distance of on CIFAR-10 dataset.

2 Preliminaries

2.1 Adversarial Examples

An adversarial example is a modified input (visually similar to ) to a classifier aiming . An attacker finds perturbation under certain distance metric (usually

norm) to construct an adversarial example. An attack algorithm usually minimizes the perturbation or maximizes the loss function, i.e.,

(1)
(2)

where . There are many attack algorithms such as Fast Gradient Sign Method (FGSM) [13], Projected Gradient Descent (PGD) [19], Carlini and Wagner (CW) [8], etc.

2.2 Threat Model

Following [7] and [4], first, we describe a threat model that we use to evaluate the proposed defense. We deploy PGD [19] because it is one of the strongest attacks under norm bounded metric.

Based on the goal of an adversary, the attack can be whether targeted ( where is a class targeted by the adversary) or untargeted ( where is a true class). We focus on untargeted attacks under , where is a given noise distance.

We evaluate the proposed defense in white-box settings. Therefore, we assume this adversary has full knowledge of the model, its parameters, trained weights, training data and the proposed defense mechanism except a secret key.

The adversary performs evasion attacks (i.e., test time attacks) in which small changes under metric change the true class of the input. The adversary’s capability is to modify the test image where the noise distance is in the range of . Having full knowledge of the defense transform, our adversary also extends PGD. Fully accounting obfuscated gradients, the adversary implements an adaptive attack like Backward Pass Differentiable Approximation (BPDA) [3]

to estimate the correct gradients with a guessed key.

3 Proposed Method

3.1 Overview

The goal of the proposed method is to hold high accuracy whether or not the model is under adversarial attacks. The overview of the proposed defense is depicted in Fig. 1. Training images are transformed by a secret key and a model is trained by the transformed images. Test images regardless of being clean images or adversarial examples are also transformed with the same key before classification process by the model.

Figure 1: Overview of the proposed defense.

3.2 Defensive Transform

We introduce a transform that exploits block-wise pixel shuffling with a secret key as an adversarial defense for the first time. Both training and test images are transformed with a common key. The transformation process is as follows.

A 3-channel (RGB), 8-bit image with a dimension of is divided into blocks (with the size of ) where and should be divisible by

. Otherwise, padding is required.

Let and be the pixel value and the number of pixels in each block (i.e., ), where . The new pixel value is given by

(3)

where

is a random permutation vector of the integers from

to generated by a key .

Figure 2: Process of block-wise pixel shuffling.

Fig. 2 illustrates the process of block-wise pixel shuffling. The process is repeated for all the blocks in the image.

3.3 Adaptive Attack

As pointed out by [7] and [4], adaptive attacks are necessary in evaluating adversarial defenses. Several recent defenses are defeated by adaptive attacks due to obfuscated gradients [3]. To ensure the strength of the proposed defense, we implement a BPDA-like attack so that the gradients are correct with respect to the attacker’s guessed key as shown in Fig. 3. Basically, the adversary applies block-wise shuffling to a test image with a key, PGD is run on the shuffled image and the resulting adversarial example is de-shuffled with the adversary’s assumed key. We used random keys to attack the proposed method in our experiments.

3.4 Key Management

The proposed method uses a shared secret key to all the blocks in each of both training and test images. Its key space is defined as follows:

(4)

where

is the number of pixels in a block. Deep learning is often done in the cloud server (provider) and the key

should be saved securely at the server in deploying the proposed method.

Figure 3: Diagram of adaptive attack.

4 Experiments

4.1 Setup

We used CIFAR-10 [16] dataset with a batch size of 128 and live augmentation (random cropping with padding of 4 and random horizontal flip) on training set. CIFAR-10 consists of 60,000 color images (dimension of ) with 10 classes (6000 images for each class) where 50,000 images are for training and 10,000 for testing. Both training and test images were preprocessed by the proposed method with a common shared secret key .

The deep residual network [15]

with 18 layers (ResNet18) was trained for 160 epochs by the stochastic gradient descent optimizer. The parameters are: momentum of

, weight decay of and initial learning rate of . A step learning rate scheduler was used with the settings (lr_steps , gamma ).

The parameters of PGD adversary are in the range of , and . The attack was run for and iterations with/without random initialization. When random initializaition is set, perturbation is initialized with random values bounded by given .

We used publicly available ResNet18 implementation [17]

on PyTorch. The proposed method was implemented by modifying the code base of 

[27]. We deployed traditional PGD implementation from [10] and implemented BPDA-like attack to make the adversary adaptive and effective.

4.2 Results

4.2.1 PGD Attack on Various Block Sizes

We evaluated the proposed method under the use of various block sizes, by PGD. We trained ResNet18 with images transformed by the proposed method with different block size resulting four models. The trained models were first attacked by PGD with for iterations (i.e., PGD) without random initialization.

Table 1 summarizes the results obtained from the experiment of the proposed method. The model trained with transformed images where gave the best performance () when the model is not under attacks. However, performed better under attacks (i.e., ). The results suggest that provides the best overall performance.

Clean PGD
Table 1: Accuracy of the proposed method under the use of various block sizes on PGD ()

4.2.2 PGD Attack in Various Settings

We further ran PGD attacks with various settings to the model trained by the proposed defense where . The attacks were executed for and iterations, and subscript denotes random initialization (e.g., PGD stands for PGD attack for iterations with random initialization and BPDA denotes the adaptive attack).

Table 2 captures the results of untargeted attacks where and . When , the model maintain accuracy on clean images and on BPDA attack. This confirms that the adaptive attack cannot reduce the accuracy when the attacker’s key is not correct. However, when was increased to , BPDA reduced the accuracy to .

Our experiments show that BPDA is a better adversary. Therefore, we evaluated the proposed defense with various by BPDA. Moreover, to confirm the effectiveness of the proposed method, we also implemented state-of-the-art adversarial defense method, i.e., adversarial training (AT) [19] on the same network specifications with to compare the results. The accuracy versus various noise distances is plotted in Fig. 4. When , the model trained by the proposed defense provides more than accuracy. The accuracy gradually drops when is greater than . Specifically, when , the model achieves accuracy. On the worst case scenario (i.e. ), the accuracy of the model is . Nevertheless, the proposed method outperforms AT in any given perturbation budget as shown in Fig. 4.

Epsilon Clean PGD PGD PGD PGD BPDA BPDA BPDA BPDA
Table 2: Accuracy of the proposed method () by PGD attack in various settings

Figure 4: Accuracy vs. perturbation budget.

4.3 Comparison with State-of-the-art Defenses

To confirm the effectiveness of the proposed defense, we made a comparison with state-of-the-art published defenses for CIFAR-10 dataset on RobustML catalog111https://www.robust-ml.org/. We compared the proposed defense with the recent three defenses: latent adversarial training (LAT) [18], adversarial training (AT) [19] and thermometer encoding (TE) [6]. All three defenses used wide residual network [32] and were evaluated on threat model with except LAT (used ). Table 3 shows the summary of the comparison. The proposed model was trained on ResNet18 and achieves superior accuracy (i.e., on clean images and on attacked ones). Even on the worst case scenario (i.e., ), the accuracy of the proposed method was still higher than the state-of-the-art defenses whether or not the model was under attacks.

Defense Threat Model Clean Attacked
LAT [18]
AT [19]
TE [6]
Proposed
Proposed
Proposed
Table 3: Comparison with state-of-the-art defenses on CIFAR-10 dataset

5 Conclusion

In this paper, we proposed a new adversarial defense that utilizes a key-based block-wise pixel shuffling method as a defensive transform for the first time. Specifically, both training and test images are transformed by the proposed method with a common key before training and testing. We also implemented an adaptive attack to verify the strength of the proposed defense. Our experiments suggest that the proposed defense is resistant to both adaptive and non-adaptive attacks. The results show that the proposed defense achieves higher accuracy, on clean images and on adversarial examples. Compared to state-of-the-art defenses, the accuracy of the proposed method is better than latent adversarial training, than adversarial training and than thermometer encoding under a maximum-norm bounded white-box threat model with the noise distance of on CIFAR-10 dataset.

References

  • [1] M. AprilPyone, Y. Kinoshita, and H. Kiya (2019) Adversarial robustness by one bit double quantization for visual classification. IEEE Access 7 (), pp. 177932–177943. Cited by: §1.
  • [2] M. AprilPyone, W. Sirichotedumrong, and H. Kiya (2019) Adversarial test on learnable image encryption. In 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE), pp. 693–695. Cited by: §1.
  • [3] A. Athalye, N. Carlini, and D. A. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, pp. 274–283. Cited by: §1, §2.2, §3.3.
  • [4] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli (2013) Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pp. 387–402. Cited by: §1, §2.2, §3.3.
  • [5] B. Biggio and F. Roli (2018) Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognition 84, pp. 317–331. Cited by: §1.
  • [6] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow (2018) Thermometer encoding: one hot way to resist adversarial examples. In International Conference on Learning Representations, Cited by: §1, §4.3, Table 3.
  • [7] N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, and A. Madry (2019) On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705. Cited by: §2.2, §3.3.
  • [8] N. Carlini and D. Wagner (2017-05) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §2.1.
  • [9] T. Chuman, W. Sirichotedumrong, and H. Kiya (2019) Encryption-then-compression systems using grayscale-based image encryption for jpeg images. IEEE Transactions on Information Forensics and Security 14 (6), pp. 1515–1525. Cited by: §1.
  • [10] G. W. Ding, L. Wang, and X. Jin (2019) AdverTorch v0.1: an adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623. Cited by: §4.1.
  • [11] K. Dvijotham, R. Stanforth, S. Gowal, T. A. Mann, and P. Kohli (2018) A dual approach to scalable verification of deep networks.. In UAI, Vol. 1, pp. 2. Cited by: §1.
  • [12] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song (2018) Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1625–1634. Cited by: §1.
  • [13] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In International Conference on Learning Representations, Cited by: §2.1.
  • [14] C. Guo, M. Rana, M. Cisse, and L. van der Maaten (2018) Countering adversarial images using input transformations. In International Conference on Learning Representations, Cited by: §1.
  • [15] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.1.
  • [16] A. Krizhevsky (2009) Learning multiple layers of features from tiny images. Technical report . Cited by: §1, §4.1.
  • [17] Kuangliu (2017) Train cifar10 with pytorch. GitHub. Note: https://github.com/kuangliu/pytorch-cifar Cited by: §4.1.
  • [18] N. Kumari, M. Singh, A. Sinha, H. Machiraju, B. Krishnamurthy, and V. N. Balasubramanian (2019-07) Harnessing the vulnerability of latent layers in adversarially trained models. In

    Proceedings of the 28th International Joint Conference on Artificial Intelligence

    ,
    pp. 2779–2785. Cited by: §4.3, Table 3.
  • [19] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, Cited by: §2.1, §2.2, §4.2.2, §4.3, Table 3.
  • [20] E. Raff, J. Sylvester, S. Forsyth, and M. McLean (2019) Barrage of random transforms for adversarially robust defense. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6528–6537. Cited by: §1.
  • [21] A. Raghunathan, J. Steinhardt, and P. Liang (2018) Certified defenses against adversarial examples. In International Conference on Learning Representations, Cited by: §1.
  • [22] P. Samangouei, M. Kabkab, and R. Chellappa (2018) Defense-GAN: protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, Cited by: §1.
  • [23] W. Sirichotedumrong, Y. Kinoshita, and H. Kiya (2019) Pixel-based image encryption without key management for privacy-preserving deep neural networks. IEEE Access 7, pp. 177844–177855. Cited by: §1.
  • [24] W. Sirichotedumrong and H. Kiya (2019) Grayscale-based block scrambling image encryption using ycbcr color space for encryption-then-compression systems. APSIPA Transactions on Signal and Information Processing 8. Cited by: §1.
  • [25] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman (2018) PixelDefend: leveraging generative models to understand and defend against adversarial examples. In International Conference on Learning Representations, Cited by: §1.
  • [26] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. In International Conference on Learning Representations, Cited by: §1.
  • [27] M. Tanaka (2018) Learnable image encryption. In 2018 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), pp. 1–2. Cited by: §1, §4.1.
  • [28] O. Taran, S. Rezaeifar, and S. Voloshynovskiy (2018) Bridging machine learning and cryptography in defence against adversarial attacks. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1.
  • [29] E. Wong and J. Z. Kolter (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of the 35th International Conference on Machine Learning, pp. 5283–5292. Cited by: §1.
  • [30] E. Wong, F. Schmidt, J. H. Metzen, and J. Z. Kolter (2018) Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems, pp. 8400–8409. Cited by: §1.
  • [31] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille (2018) Mitigating adversarial effects through randomization. In International Conference on Learning Representations, Cited by: §1.
  • [32] S. Zagoruyko and N. Komodakis (2016-09) Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), pp. 87.1–87.12. Cited by: §4.3.