Convolutional Neural Networks (CNNs) showed a great improvement and success in many Artificial Intelligence (AI) applications, e.g., natural language processing, document analysis, face recognition and image classification (Bhandare et al., 2016). These networks achieve high accuracy for object detection, minimizing the classification error. The spatial hierarchies between the objects, e.g., orientation, position and scaling, are not preserved by the convolutional layers. CNNs are specialized to identify and recognize the presence of an object as a feature, without taking into account the spatial relationships across multiple features. Recently, Sabour and Hinton et al. (2017) proposed CapsuleNets, a specialized Neural Network architecture composed of capsules, which is trained based on the Dynamic Routing algorithm between capsules. The key idea behind CapsuleNets is called inverse graphics: when the eyes analyze an object, the spatial relationships between its parts are decoded and matched with the representation of the same object in our brain. Similarly, in CapsuleNets the feature representations are stored inside the capsules in a vector form, in contrast to the scalar form used by the neurons in traditional Neural Networks (Mukhometzianov and Carrillo, 2018). Despite the great success in the field of image classification, recent works (18) have demonstrated that, similarly to CNNs, CapsuleNets are also not immune to adversarial attacks. Adversarial examples are small perturbations added to the inputs, which are generated for the purpose to mislead the network. Since these examples can fool the network and point out its security vulnerability, they can be dangerous in safety-critical applications, like Voice Controllable Systems (VCS) (Carlini et al, 2016; Zhang et al., 2017) and traffic signs recognition (Kukarin et al., 2017; Yuan et al., 2017). Many works (Hafahi et al., 2018; Kukarin et al., 2017) have analyzed the impact of adversarial examples in Neural Networks and studied different methodologies to improve the defense solutions. Adversarial attacks can be categorized according to different properties, e.g., the choice of the class, the kind of the perturbation and the knowledge of the network under attack, as shown in Figure3. We analyze these properties in the Section 2B.
In this paper, we target the following fundamental research questions:
If and how are the CapsuleNets vulnerable to adversarial examples?
How can adversarial attack for CapsuleNets be at the same time imperceptible and robust?
How does the vulnerability of CapsuleNets to adversarial attacks differ from that for the traditional CNNs?
To address these questions, we develop an algorithm to generate targeted imperceptible and robust (i.e., resistant to phisycal transformations) attacks considering a black-box scenario. To the best of our knowledge, we are the first to perform a comprehensive study of robustness/vulnerability of CapsuleNet to such adversarial attacks for the German Traffic Sign Recognition Benchmark (Houben et al., 2013), which is more crucial for autonomous vehicle related use cases. We also apply the same type of attacks to a 9-layer CNN having a similar starting accuracy, compared to the CapsuleNet. Our analyses show that CapsuleNets are more vulnerable than CNNs. Moreover, we investigate the impact of universal attacks on CapsuleNet with different (additive and subtractive) perturbations varying in their magnitudes. Our analyses show that, when the noise is subtracted from the intensity of the pixels, the accuracy of the network decreases quickly when compared to the other case. An overview of our approach is shown in Figure 1.
Our Novel Contributions:
We analyze how the accuracy of CapsuleNet changes when universal attacks of different magnitudes of perturbation are added or subtracted to all the pixels of the input images of the GTSRB dataset.
We develop a novel methodology to automatically generate targeted imperceptible and robust adversarial examples.
We evaluate the robustness of CapsuleNet and a 9-layer CNN under the adversarial examples generated by our algorithm.
Ii Related Work
Capsules were first introduced by Hinton et al. (2011). They are multi-dimensional entities that are able to learn hierarchical information of the features. Compared to traditional CNNs, a CapsuleNet has the capsule as the basic element, instead of the neuron. Recent researches about CapsuleNet architecture and training algorithms (Hinton et al., 2018;Sabour et al., 2017) have shown competitive results, in terms of accuracy, for image classification task, compared to other state-of-the-art classifiers. Kumar et al. (2018) proposed a CapsuleNet architecture, composed of 3 layers, which achieve good performance for the GTSRB dataset (Houben et al., 2013). The architecture is visible in Figure2.
Ii-B Adversarial Attacks
Szegedy et al. (2014) were the first to discover that several machine learning models are vulnerable to adversarial examples. Goodfellow et al. (2015) explained the problem observing that machine learning models misclassify examples that are only slightly different from correctly classified examples drawn from the data distribution. Considering an input example , the adversarial example is equal to the original one, except for a small perturbation . The goal of the perturbation is to maximize the prediction error, in order to make the predicted class different from the target one . The attackers can force the network to misclassify the inputs into a specific class, called target class, or into a random class: this is the difference between targeted and untargeted attacks. Individual attacks create perturbations of different magnitude for each different input, while universal attacks apply the same perturbation to all the inputs of the dataset (Yuan et al., 2017). Attacks are applied under black box assumption when the attacker does not know the architecture, the training data and the parameters of the network (Papernot et al., 2017). When the attacker knows the architecture, the training data and the parameters of the network, the attack is under white box
assumption. In recent years, many methodologies to generate adversarial examples and their respective defense strategies have been proposed (Feinman et al., 2017; Bhagoji et al., 2017a; Bhagoji et al., 2017b). An adversarial attack is very efficient if it is imperceptible and robust: this is the main concept of the analysis conducted by Luo et al. (2018). They studied the importance to change the pixels in high variance areas, since the human eyes do not perceive so much their modifications. Moreover, an adversarial example is robust if the gap between the probabilities relative to the predicted and the target class is so large that, after an image transformation (e.g., compression or resizing), the example remains misclassified.
Recent works showed that CapsuleNet is vulnerable to adversarial attacks. Jaesik Yoon  analyzes how the accuracy of CapsuleNet changes applying Fast Gradient Sign Method (FGSM), Basic Iteration Method (BIM), Least-likely Class Method and Iterative Least-likely Class Method (Kukarin et al., 2017) to the MNIST dataset (LeCunn et al., 1998). Frosst et al. (2017) presented the technique called DARCCC (Detecting Adversaries by Reconstruction from Class Conditional Capsules), efficient on MNIST, Fashion-MNIST (Xiao et al., 2017) and SVHN (Netzer et al., 2011) datasets, to detect the crafted images.
Iii Analysis: Evaluating the robustness of CapsuleNet
Iii-a Experimental Setup
We consider the architecture of CapsuleNet, represented in Figure 2
. It is composed by a convolutional layer, with kernel 9x9, a convolutional capsule layer, with kernel 5x5, and a fully connected capsule layer. We implement it in Tensorflow, to perform classification on the German Traffic Signs Dataset (Kumar et al., 2018). This dataset is composed byRGB traffic signs images, divided into 34799 training examples and 12630 testing examples. The intensity of each pixel assumes a value from 0 to 1. The number of classes is 43.
Iii-B Accuracy of the CapsuleNet under universal adversarial attacks
We analyze the accuracy of the CapsuleNet applying two different types of universal attacks to all the pixels of all the testing images:
Addition and subtraction of fixed perturbations of different magnitudes.
Addition of gaussian perturbations of different magnitudes.
The accuracy of the CapsuleNet on the clean testing examples is approximatively 97%. Applying the noise with different magnitudes, we obtain the results in Figure 4. Examples of the resulting images under fixed perturbation and gaussian perturbation are shown in Figures 8 and 11, respectively. When the fixed noise is subtracted from the test images, the accuracy tends to decrease faster, compared to the case in which the noise is added to them. This effect can be explained by analyzing the test images in more detail. The average values of the pixel intensities of all the testing examples are equal to 0.33, 0.30 and 0.32 for the first, the second and the third RGB channel, respectively. Hence, when the noise is subtracted, most of the pixels tend to become closer to 0 and the network easily misclassifies the image, because pixel values at the extremity of their range become meaningless. As shown in the example of Figure 8, the image with subtracted noise (Figure (c)c) is very dark, while the one with added noise (Figure (b)b) is lighter than the original, but still recognizable also for humans. In Figure 4
, the accuracy of the CapsuleNet with Gaussian noise decreases more sharply than the case relative to the fixed noise: in fact, the wide range of values of the Gaussian distribution, multiplied by the noise magnitude, create perceivable perturbations, as shown in Figure(b)b.
In our example, as represented in Figure 15, the sign is recognized as a ”Stop” with probability 0.057. In this case, when the noise is added to the pixels, the probability decreases a little bit more than the case in which the noise is subtracted. In fact, in this example, the averages of the pixels are higher than the previous mean average, i.e., 0.52, 0.44 and 0.44, for each channel, respectively. Hence, the network easily misclassifies this image when the noise is added to the pixels. Nevertheless, the network correctly classifies our example because the gap between the probability of the ”Stop” class and the highest among the other ones is large and a noise magnitude of 0.3 is not great enough to cause a misclassification. Otherwise, in the case of Gaussian noise, the probability of the ”Stop” class decreases significantly, because the intensity of the noise is very perceivable, and thus decreases the quality of the image.
Iv Our Methodology: Automatic Generation of Targeted Imperceptible and Robust Adversarial Examples
An efficient adversarial attack is able to generate imperceptible and robust examples to fool the network. First we analyze the importance of these two concepts and then we describe our algorithm, based on them.
Iv-a Imperceptibility and robustness of adversarial examples
An adversarial example can be defined imperceptible
if the modifications of the original sample are so small that humans cannot notice them. To create an imperceptible adversarial example, we need to add the perturbations in the pixels of the image with the highest standard deviation. In fact the perturbations added in high variance zones are less evident and more difficult to detect with respect to the ones applied in low variance pixels. Considering an area ofpixels x, the standard deviation (SD) of the pixel can be computed as the square root of the variance as in Equation 1, where is the average value of the pixels:
Hence, when the pixel is in a high variance region, its standard deviation is high and the probability to detect a modification of the pixel is low. In order to measure the imperceptibility, it is possible to define the distance between the original sample X and the adversarial sample X* as in Equation 2, where is the perturbation added to the pixel :
This value indicates the total perturbation added to all the pixels under consideration. We define also as the maximum total perturbation tolerated by the human eye.
An adversarial example can be defined robust if the gap function, i.e., the difference between the probability of the target class and the maximum class probability is maximized:
If the gap function increases, the example becomes more robust, because the modifications of the probabilities caused by some image transformations (e.g., compression or resizing) tend to be less effective. Indeed, if the gap function is high, a variation of the probabilities could not be sufficient to achieve a misclassification.
Iv-B Generation of the attacks
We propose a greedy algorithm that automatically generates targeted imperceptible and robust adversarial examples in a black-box scenario, i.e., we assume that the attacker has access to the input image and to the output probabilities vector, but not to the network model. The methodology is shown in Algorithm 1. The goal of our iterative algorithm is to modify the input image in order to maximize the gap function (imperceptibility) until the distance between the original and the adversarial example is under (robustness).
Moreover, the algorithm takes in account the fact that every pixel is composed by three different values, since the images are based on three channels. Our algorithm chooses a subset of pixels, included in the area of pixels, with the highest SD for every channel, so that their possible modification is difficult to detect. Then, the gap function is computed as the difference between the probability of the target class, chosen as the class with the second highest probability, and the maximum output probability. Hence, for each pixel of P, we compute and
: these quantities correspond to the values of the gap function, estimated by adding and by subtracting, respectively, a perturbation unit to each pixel. These gaps are useful to decide if it is more effective to add or subtract the noise. For each pixel of P, we consider the greatest value betweenand to maximize the distance between the two probabilities. Therefore, for each pixel of , we calculate the Variation Priority by multiplying the gap difference to the SD of the pixel. This quantity indicates the efficacy of the pixel perturbation. For every channel, the values of Variation Priority are ordered and the highest V values, between the three channels, are perturbed. Then, starting from values of Variation Priority, only V values are perturbed. According to the highest value of the previous computed and , the noise is added or subtracted. Once the original input image is replaced by the adversarial one, the next iteration can start. The iterations continue until the distance between the original and adversarial example overcomes . The scheme of our algorithm is shown in Figure 16.
V Impact of our attack on the CapsuleNet and the VGGNet
V-a Experimental Setup
We apply our methodology, showed in Section 4B, to the previously described CapsuleNet architecture and to a 9-layer VGGNet, implemented in Tensorflow and represented in Figure 17
. The VGGNet, trained for 30 epochs, achieves a test accuracy equal to 97%. Therefore, it allows us to have a good comparison between the two networks, since their accuracy is very similar. To verify how our algorithm works, we test it on two different examples. We consider M=N=32, because the GTRSB dataset is composed by 3232 images and P=100. The value of is equal to the 10% of the maximum value between all the pixels. The parameter depends on the SD of the pixels of the input image: its value changes according to the different examples because does not increase in the same way for each example.
V-B Our methodology applied to the CapsuleNet
We test CapsuleNet on two different examples, shown in Figures (a)a and (a)a. For the first one, we distinguish two cases, in order to verify that our algorithm works independently from the target class, but with different final results:
Case I: the target class is the class relative to the second highest probability between all the beginning output probabilities.
Case II: the target class is the class relative to the fifth highest probability between all the beginning output probabilities.
We can make the following analyses on our examples:
CapsuleNet classifies the input image shown in Figures (a)a and (a)a as ”120 km/h speed limit” with a probability equal to 0.0370.
For the Case I, the target class is ”Double curve” with a probability of 0.0297. After 15 iterations of our algorithm, the image (in Figure (b)b) is classified as ”Double curve” with a probability equal to 0.0319. Hence, the probability of the target class has overcome the initial one, as shown in Figure (a)a. The traffic signs relative to every class are showed in the table in Figure 12. At this step, the distance is equal to 515.64. Increasing the number of the iterations, the robustness of the attack increases as well, because the gap between the two probabilities increases, but also the perceptibility of the noise becomes more evident. After the iteration 20, the distance grows above : the sample is represented in Figure (c)c.
For the Case II, the probability relative to the target class ”Beware of ice/snow” is equal to 0.0249, as shown in Figure (b)b. The gap between the maximum probability and the probability of the target class is higher than the gap in Case I. After 25 iterations, the distance grows above and the network has not still misclassified the image (in Figure (b)b). In Figure (b)b we can observe that, however, the gap between the two classes has decreased. In this case, we show that our algorithm still works, but, since we choose a target class which does not have the second highest probability, the initial gap between the two probabilities is high and the algorithm needs to perform several iterations to fool the network. Therefore, after many iteration, the noise becomes more perceivable.
CapsuleNet classifies the input image shown in Figure (a)a as ”Children crossing” with a probability equal to 0.042. The target class is ”60 km/h speed limit” with a probability equal to 0.0331. After 16 iterations, the network misclassifies the image (in Figure (b)b) because the probability of the target class overcomes the initial maximum probability, as shown in Figure (c)c. The gap between the two probabilities increases until the iteration 33, when the distance is equal to 190: after this iteration, the value of the distance overcomes and the noise becomes perceivable, as represented in Figure (c)c. The gap has increased, as we can observe in Figure (c)c.
V-C Our methodology applied to a 9-layer VGGNet
In order to compare the robustness of the CapsuleNet and the 9-layer VGGNet, we choose to evaluate the previous two examples, misclassified by the CapsuleNet. For the first example, we consider only the Case I as benchmark. The VGGNet classifies the input images with different output probabilities, compared to the ones obtained by the CapsuleNet. Therefore, our metric to evaluate how much VGGNet is resistant to our attack is based on the value of the gap at the same distance. We can make the following considerations on our two examples:
the VGGNet classifies the input image (in Figure (a)a) as ”120 km/h speed limit” with a probability equal to 0.979. The target class is ”100 km/h speed limit” with a probability equal to 0.0075. After 4 iterations, the distance overcomes and the network do not misclassify the image (in Figure (b)b): our algorithm would needs to perform more iterations before obtaining a misclassification, since the two probabilities at the starting point were much distant, as shown in Figure (a)a.
the VGGNet classifies the input image (in Figure (a)a) as ”Children crossing” with a probability equal to 0.99. We choose as target class ”Bicycles crossing” with the third highest probability, 0.0001. After 9 iterations, the distance overcomes and the network does not misclassify the image (in Figure (b)b). As in the previous case, this fact happens because the starting probabilities are very far, as shown in Figure (b)b. In (c)c, when has overcome , the noise is more perceivable than in the CapsuleNet at the same iteration (in Figure (b)b).
V-D Comparison and results
From our analyses, we can observe that the 9-layer VGGNet is more resistant to our adversarial attack compared to CapsuleNet, since the perturbations are less perceivable. For instance, in Figure (c)c we can notice that at the iteration 33, when overcomes the value of , the gap between the probabilities increased a lot. For a similar value of , in Figure (b)b, at the iteration 9, when overcomes the value of , the gap changes very slowly. Our consideration is justified from the graphs in Figure 45: in the cases of VGGNet, the value of increases more sharply than for CapsuleNet. So the percipience of noise in the image can be measured as the value of divided by the number of iterations: the noise in VGGNet becomes perceivable after few iterations. Moreover, we can observe that the choice of the target class plays an key role for the success of the attack.
We notice that other features that evidence the differences between the VGGNet and the CapsuleNet were not considered. The VGGNet is deeper and contains a larger number of weights, while the CapsNet can achieve a similar accuracy with a smaller footprint. This effect causes a disparity in the prediction confidence between the two networks. Thus, such observation raises an important research question: is correct to compare the vulnerability of the CapsuleNet to a CNN with the same accuracy? And, if not, how can it be evaluated? It is clear that the CapsuleNet has a much higher learning capability, compared to the VGGNet, but this phenomena has a negative drawback for the machine learning security point of view.
In this paper, we proposed a novel methodology to generate targeted adversarial attacks in a black box scenario. We applied our attack to the German Traffic Sign Recognition Benchmarks (GTSRB) dataset and we verified its impact on a CapsuleNet and a 9-layer VGGNet. Our experiments show that the VGGNet appears more robust to the attack, while the modifications of the pixels in the traffic signs are less perceivable when our algorithm is applied to the CapsuleNet. We found an important weakness in the CapsuleNets: in an era in which the autonomous driving community is looking for high security of automatic systems in safety-critical environments, the CapsuleNet does not guarantee a sufficient robustness. Hence, further modifications of the CapsuleNet architecture need to be designed to reduce its vulnerability to adversarial attacks.
-  G. E. Hinton, A. Krizhevsky, and S. D. Wang. Transforming auto-encoders. In ICANN, 2011.
-  G. E. Hinton, N. Frosst, and S. Sabour. Matrix capsules with em routing. In ICLR, 2018.
-  S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel. Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In IJCNN, 2013.
-  A. D. Kumar, R. Karthika, and L. Parameswaran. Novel deep learning model for traffic sign detection using capsule networks. arXiv preprint arXiv:1805.04424, 2018.
-  S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. In NIPS, 2017.
-  A. Shafahi, W. R. Huang, C. Studer, S. Feizi, and T. Goldstein. Are adversarial examples inevitable? arXiv preprint arXiv:1809.02104, 2018.
-  I. Goodfellow, J. Shlens and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
-  A. Kukarin, I. Goodfellow and S. Bengio. Adversarial examples in the physical world. In ICLR, 2017
-  N. Frosst, S. Sabour, G. Hinton. DARCCC: Detecting Adversaries by Reconstruction from Class Conditional Capsules, arXiv preprint arXiv:1811.06969, 2018
-  N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. Celik, and A. Swami. Practical Black-Box Attacks against Machine Learning. In ACM Asia Conference on Computer and Communications Security, 2017.
-  Bo Luo, Yannan Liu, Lingxiao Wei, and Qiang Xu. Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks. arXiv preprint arXiv:1801.04693, 2018.
-  R. Feinman, R. Curtin, S. Shintre, and A. Gardner. Detecting adversarial examples from artifacts. arXiv preprint arXiv:1703.00410, 2017.
-  A. Bhagoji, D. Cullina, and P. Mittal. Dimensionality reduction as a Defense against Evasion Attacks on Machine Learning Classifiers. arXiv preprint arXiv:1704.02654, 2017a.
-  A. Bhagoji, D. Cullina, C. Sitawarin and P. Mittal. Enhancing Robustness of Machine Learning Systems via Data Transformations. arXiv preprint arXiv:1704.02654, 2017b.
-  C. Szegedy, W. Zaremba, and I. Sutskever. Intriguing properties of neural networks. In ICLR, 2014.
-  X. Yuan, P. He, Q. Zhu, R. R. Bhat, and X. Li. Adversarial examples: Attacks and defenses for deep learning. arXiv preprint arXiv:1712.07107, 2017.
-  R. Mukhometzianov, and J. Carrillo. CapsNet comparative performance evaluation for image classification. arXiv preprint arXiv 1805.11195, 2018.
-  Adversarial Attack to Capsule Networks project. Available online at: https://github.com/jaesik817/adv_attack_capsnet
-  A. Bhandare, M. Bhide, P. Gokhale, and R. Chandavarka. Applications of Convolutional Neural Networks, In IJCSIT, 2016.
-  N. Carlini, P. Mishra, T.Vaidya, Y Zhang, M.Sherr, C. Shields, D. Wagner, and W. Zhou. Hidden voice commands. In USENIX Security Symposium, 2016.
-  G. Zhang, C. Yan, X. Ji, T. Zhang, T. Zhang, and W. Xu. Dolphinatack: Inaudible voice commands. arXiv preprint arXiv:1708.09537, 2017.
-  H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. In CoRR abs/1708.07747, 2017.
-  Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 1998.
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in
natural images with unsupervised feature learning. In
NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.