Security in computer vision systems is quintessential and high in demand. This is because computer vision technology has been deployed in many applications including safety and security critical applications such as self-driving cars, healthcare, facial recognition, etc. and many more visual recognition systems. Computer vision systems are primarily powered by deep neural networks (DNNs). It is proven that DNNs have brought impressive state-of-the-art results to computer vision. However, researchers have already discovered that neural networks in general are vulnerable towards certain alteration in the input known as adversarial examples[26, 4]
. These adversarial examples can cause neural networks misclassify or force to classify a targeted class with high confidence. Incorrect decisions made by DNNs can cause serious and dangerous problems. As an example, self-driving cars may misclassify “Stop” sign as “Speed Limit”. Due to this threat, adversarial machine learning research has got a significant amount of attention recently although it has been started over a decade ago .
Researchers have proposed various attacks and defenses. Ideally, provable robust models are desired. Inspiring works such as [21, 11, 29] proposed provable secure training. Although these methods are attractive and desirable, they are not available for larger datasets. One recent work  scaled up to CIFAR-10  dataset in provable defense research. However, the accuracy is not comparable even on low adversarial noise distance. There is also an alternative approach to find a defensive transform so that the prediction of a classifier on clean image is equal to that of an adversarial example (i.e., ). Such works include [6, 22, 14, 31, 25], etc. They all have been defeated when accounting for obfuscated gradients (a way of gradient masking) . To reinforce these weak defense methods, Raff et al.  proposed a stronger defense by combining a large number of transforms stochastically. However, applying many transforms drop in accuracy even though the model is not under attack and is computationally expensive. Our previous work removes adversarial noise generated on one-bit images by double quantization , but, clean images are limited to be in one-bit.
Therefore, in this work, we propose a new adversarial defense which has been inspired by perceptual image encryption methods [9, 24, 27, 23]. It was reported that  can be used as a defensive transform . However, it is not meant for adversarial defense and reduces accuracy. To defend adversarial examples and maintain high accuracy, we design a defensive transform that uses a block-wise pixel shuffling method. Similar to our work, Taran et al. proposed a key-based adversarial defense . The main intellectual differences include: (1) the proposed defense is inspired by perceptual image encryption (specifically, block-wise image encryption), in contrast to traditional cryptographic methods and (2) we consider white-box attacks unlike the work by  that considered gray-box attacks. In an experiment, the proposed defense is confirmed to outperform state-of-the-art adversarial defenses including latent adversarial training, adversarial training and thermometer encoding under maximum-norm bounded threat model with the noise distance of on CIFAR-10 dataset.
2.1 Adversarial Examples
An adversarial example is a modified input (visually similar to ) to a classifier aiming . An attacker finds perturbation under certain distance metric (usually
norm) to construct an adversarial example. An attack algorithm usually minimizes the perturbation or maximizes the loss function, i.e.,
2.2 Threat Model
Based on the goal of an adversary, the attack can be whether targeted ( where is a class targeted by the adversary) or untargeted ( where is a true class). We focus on untargeted attacks under , where is a given noise distance.
We evaluate the proposed defense in white-box settings. Therefore, we assume this adversary has full knowledge of the model, its parameters, trained weights, training data and the proposed defense mechanism except a secret key.
The adversary performs evasion attacks (i.e., test time attacks) in which small changes under metric change the true class of the input. The adversary’s capability is to modify the test image where the noise distance is in the range of . Having full knowledge of the defense transform, our adversary also extends PGD. Fully accounting obfuscated gradients, the adversary implements an adaptive attack like Backward Pass Differentiable Approximation (BPDA) 
to estimate the correct gradients with a guessed key.
3 Proposed Method
The goal of the proposed method is to hold high accuracy whether or not the model is under adversarial attacks. The overview of the proposed defense is depicted in Fig. 1. Training images are transformed by a secret key and a model is trained by the transformed images. Test images regardless of being clean images or adversarial examples are also transformed with the same key before classification process by the model.
3.2 Defensive Transform
We introduce a transform that exploits block-wise pixel shuffling with a secret key as an adversarial defense for the first time. Both training and test images are transformed with a common key. The transformation process is as follows.
A 3-channel (RGB), 8-bit image with a dimension of is divided into blocks (with the size of ) where and should be divisible by
. Otherwise, padding is required.
Let and be the pixel value and the number of pixels in each block (i.e., ), where . The new pixel value is given by
is a random permutation vector of the integers fromto generated by a key .
Fig. 2 illustrates the process of block-wise pixel shuffling. The process is repeated for all the blocks in the image.
3.3 Adaptive Attack
As pointed out by  and , adaptive attacks are necessary in evaluating adversarial defenses. Several recent defenses are defeated by adaptive attacks due to obfuscated gradients . To ensure the strength of the proposed defense, we implement a BPDA-like attack so that the gradients are correct with respect to the attacker’s guessed key as shown in Fig. 3. Basically, the adversary applies block-wise shuffling to a test image with a key, PGD is run on the shuffled image and the resulting adversarial example is de-shuffled with the adversary’s assumed key. We used random keys to attack the proposed method in our experiments.
3.4 Key Management
The proposed method uses a shared secret key to all the blocks in each of both training and test images. Its key space is defined as follows:
is the number of pixels in a block. Deep learning is often done in the cloud server (provider) and the keyshould be saved securely at the server in deploying the proposed method.
We used CIFAR-10  dataset with a batch size of 128 and live augmentation (random cropping with padding of 4 and random horizontal flip) on training set. CIFAR-10 consists of 60,000 color images (dimension of ) with 10 classes (6000 images for each class) where 50,000 images are for training and 10,000 for testing. Both training and test images were preprocessed by the proposed method with a common shared secret key .
The deep residual network , weight decay of and initial learning rate of . A step learning rate scheduler was used with the settings (lr_steps , gamma ).
The parameters of PGD adversary are in the range of , and . The attack was run for and iterations with/without random initialization. When random initializaition is set, perturbation is initialized with random values bounded by given .
4.2.1 PGD Attack on Various Block Sizes
We evaluated the proposed method under the use of various block sizes, by PGD. We trained ResNet18 with images transformed by the proposed method with different block size resulting four models. The trained models were first attacked by PGD with for iterations (i.e., PGD) without random initialization.
Table 1 summarizes the results obtained from the experiment of the proposed method. The model trained with transformed images where gave the best performance () when the model is not under attacks. However, performed better under attacks (i.e., ). The results suggest that provides the best overall performance.
4.2.2 PGD Attack in Various Settings
We further ran PGD attacks with various settings to the model trained by the proposed defense where . The attacks were executed for and iterations, and subscript denotes random initialization (e.g., PGD stands for PGD attack for iterations with random initialization and BPDA denotes the adaptive attack).
Table 2 captures the results of untargeted attacks where and . When , the model maintain accuracy on clean images and on BPDA attack. This confirms that the adaptive attack cannot reduce the accuracy when the attacker’s key is not correct. However, when was increased to , BPDA reduced the accuracy to .
Our experiments show that BPDA is a better adversary. Therefore, we evaluated the proposed defense with various by BPDA. Moreover, to confirm the effectiveness of the proposed method, we also implemented state-of-the-art adversarial defense method, i.e., adversarial training (AT)  on the same network specifications with to compare the results. The accuracy versus various noise distances is plotted in Fig. 4. When , the model trained by the proposed defense provides more than accuracy. The accuracy gradually drops when is greater than . Specifically, when , the model achieves accuracy. On the worst case scenario (i.e. ), the accuracy of the model is . Nevertheless, the proposed method outperforms AT in any given perturbation budget as shown in Fig. 4.
4.3 Comparison with State-of-the-art Defenses
To confirm the effectiveness of the proposed defense, we made a comparison with state-of-the-art published defenses for CIFAR-10 dataset on RobustML catalog111https://www.robust-ml.org/. We compared the proposed defense with the recent three defenses: latent adversarial training (LAT) , adversarial training (AT)  and thermometer encoding (TE) . All three defenses used wide residual network  and were evaluated on threat model with except LAT (used ). Table 3 shows the summary of the comparison. The proposed model was trained on ResNet18 and achieves superior accuracy (i.e., on clean images and on attacked ones). Even on the worst case scenario (i.e., ), the accuracy of the proposed method was still higher than the state-of-the-art defenses whether or not the model was under attacks.
In this paper, we proposed a new adversarial defense that utilizes a key-based block-wise pixel shuffling method as a defensive transform for the first time. Specifically, both training and test images are transformed by the proposed method with a common key before training and testing. We also implemented an adaptive attack to verify the strength of the proposed defense. Our experiments suggest that the proposed defense is resistant to both adaptive and non-adaptive attacks. The results show that the proposed defense achieves higher accuracy, on clean images and on adversarial examples. Compared to state-of-the-art defenses, the accuracy of the proposed method is better than latent adversarial training, than adversarial training and than thermometer encoding under a maximum-norm bounded white-box threat model with the noise distance of on CIFAR-10 dataset.
-  (2019) Adversarial robustness by one bit double quantization for visual classification. IEEE Access 7 (), pp. 177932–177943. Cited by: §1.
-  (2019) Adversarial test on learnable image encryption. In 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE), pp. 693–695. Cited by: §1.
-  (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, pp. 274–283. Cited by: §1, §2.2, §3.3.
-  (2013) Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pp. 387–402. Cited by: §1, §2.2, §3.3.
-  (2018) Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognition 84, pp. 317–331. Cited by: §1.
-  (2018) Thermometer encoding: one hot way to resist adversarial examples. In International Conference on Learning Representations, Cited by: §1, §4.3, Table 3.
-  (2019) On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705. Cited by: §2.2, §3.3.
-  (2017-05) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §2.1.
-  (2019) Encryption-then-compression systems using grayscale-based image encryption for jpeg images. IEEE Transactions on Information Forensics and Security 14 (6), pp. 1515–1525. Cited by: §1.
-  (2019) AdverTorch v0.1: an adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623. Cited by: §4.1.
-  (2018) A dual approach to scalable verification of deep networks.. In UAI, Vol. 1, pp. 2. Cited by: §1.
-  (2018) Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1625–1634. Cited by: §1.
-  (2015) Explaining and harnessing adversarial examples. In International Conference on Learning Representations, Cited by: §2.1.
-  (2018) Countering adversarial images using input transformations. In International Conference on Learning Representations, Cited by: §1.
-  (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.1.
-  (2009) Learning multiple layers of features from tiny images. Technical report . Cited by: §1, §4.1.
-  (2017) Train cifar10 with pytorch. GitHub. Note: https://github.com/kuangliu/pytorch-cifar Cited by: §4.1.
Harnessing the vulnerability of latent layers in adversarially trained models.
Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 2779–2785. Cited by: §4.3, Table 3.
-  (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, Cited by: §2.1, §2.2, §4.2.2, §4.3, Table 3.
-  (2019) Barrage of random transforms for adversarially robust defense. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6528–6537. Cited by: §1.
-  (2018) Certified defenses against adversarial examples. In International Conference on Learning Representations, Cited by: §1.
-  (2018) Defense-GAN: protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, Cited by: §1.
-  (2019) Pixel-based image encryption without key management for privacy-preserving deep neural networks. IEEE Access 7, pp. 177844–177855. Cited by: §1.
-  (2019) Grayscale-based block scrambling image encryption using ycbcr color space for encryption-then-compression systems. APSIPA Transactions on Signal and Information Processing 8. Cited by: §1.
-  (2018) PixelDefend: leveraging generative models to understand and defend against adversarial examples. In International Conference on Learning Representations, Cited by: §1.
-  (2014) Intriguing properties of neural networks. In International Conference on Learning Representations, Cited by: §1.
-  (2018) Learnable image encryption. In 2018 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), pp. 1–2. Cited by: §1, §4.1.
-  (2018) Bridging machine learning and cryptography in defence against adversarial attacks. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1.
-  (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of the 35th International Conference on Machine Learning, pp. 5283–5292. Cited by: §1.
-  (2018) Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems, pp. 8400–8409. Cited by: §1.
-  (2018) Mitigating adversarial effects through randomization. In International Conference on Learning Representations, Cited by: §1.
-  (2016-09) Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), pp. 87.1–87.12. Cited by: §4.3.