How to Manipulate CNNs to Make Them Lie: the GradCAM Case

07/25/2019 ∙ by Tom Viering, et al. ∙ Delft University of Technology 0

Recently many methods have been introduced to explain CNN decisions. However, it has been shown that some methods can be sensitive to manipulation of the input. We continue this line of work and investigate the explanation method GradCAM. Instead of manipulating the input, we consider an adversary that manipulates the model itself to attack the explanation. By changing weights and architecture, we demonstrate that it is possible to generate any desired explanation, while leaving the model's accuracy essentially unchanged. This illustrates that GradCAM cannot explain the decision of every CNN and provides a proof of concept showing that it is possible to obfuscate the inner workings of a CNN. Finally, we combine input and model manipulation. To this end we put a backdoor in the network: the explanation is correct unless there is a specific pattern present in the input, which triggers a malicious explanation. Our work raises new security concerns, especially in settings where explanations of models may be used to make decisions, such as in the medical domain.



There are no comments yet.


page 3

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

For deep convolutional neural networks, it is difficult to explain how models make certain predictions. Explanations for decisions of such complex models are desirable


. For example, in job application matching, explanations may reveal undesireable biases in machine learning models. For settings which demand rigorous security demands such as self driving cars, explanations can help us better understand how models work in order to identify and fix vulnerabilities. In other application domains, such as neuroscience, machine learning is not only used for predictions (e.g., regarding a disease), but also to understand the cause (the underlying biological mechanism). In this case, explanations can help domain experts discover new phenomena.

The field of Explainable AI (XAI) aims to tackle this problem; how did a particular model come to its prediction? For CNNs a popular explanation takes the form of heatmaps or saliency maps [2], which indicate the pixels that were important for the final output of the model. Recently, many explanation techniques have been proposed in the literature to generate explanations for machine learning models [3, 4, 5, 6, 2, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. A nice introduction and survey to the XAI is [1].

Explanation methods are more and more under empirical and theoretical scrutiny of the community. For example, Ancona et al. [19] show equivalence and connections between several explanation methods, and Lundberg and Lee [3] unify six existing explanation methods. Several studies [6, 5, 20, 21, 22] have raised questions regarding robustness and faithfulness of these explanations methods. For example, Ghorbani et al. [22] show that an adverserial imperceptible perturbations of the input can change the explanation significantly while the model’s prediction is unchanged.

We continue this line of investigation and uncover new (security) vulnerabilities in the popular explanation method GradCAM [4]. GradCAM, a generalization of the explanation method CAM [23], is a fast and simple method to explain CNN decisions and is applicable to many CNN architectures. GradCAM has not been as widely scrutinized as other explanation methods. Adebayo et al. [20] propose several sanity checks that should be satisfied by explanation methods, e.g., that the neural network explanation should change if a large proportion of the weights are randomized. Adebayo et al. [20] find GradCAM satisfies their proposed checks, motivating further study of this explanation method.

Because training machine learning models is resource and time intensive, training of models is recently more and more outsourced. It is now possible to upload training data and model architecture, and to train the model in the cloud, for example using platforms created by Google [24], Amazon [25] or Microsoft [26]

. It is expected that this will become the norm. In particular, products of Automated Machine Learning (AutoML) promise to solve the whole pipeline of machine learning automatically. The user only has to upload the dataset, and the cloud provider will automatically try several architectures, tune hyperparameters, train models, and evaluate them

[27]. Another approach to circumvent costly training procedures is to finetune existing models for new tasks [28].

Both outsourcing and finetuning pose a security risk [29]. Gu et al. [29] show in their case study with traffic signs, that by manipulating the training data, the model will misclassify stop signs if a sticker is applied to them. Liu et al. [30]

introduce a technique that can be applied to an already trained model to introduce malicious behaviour. Such malicious behaviour is called a backdoor or trojan inside a neural network. The backdoor is triggered by specific input patterns while keeping model performance on the original task more or less the same. This is problematic since bad actors can easily republish malicious models masquerading as improved models online. Because of the blackbox nature of deep learning models, such trojans are difficult to detect

[31, 32]. Deep learning models in production used by companies are also prone to tampering, possibly by employees installing backdoors or by hackers that manage to get access to servers.

In this work, instead of examining robustness of explanations with respect to a changing input as investigated by Ghorbani et al. [22], we investigate the robustness of explanations when the model is modified by an adversary such as the scenario considered by Liu et al. [30] and Wang et al. [31]. Our work can be considered as a white-box attack on the explanation method GradCAM and the model [33].

(a) Input image
(b) Explanation of original CNN
(c) Expl. T1
(d) Expl. T2
(e) Expl. T3
(f) Expl. T4
Figure 1: Qualitative example of manipulated explanations for manipulated networks T1-T4. Blue means a pixel had a large influence on the decision. (c,d) The networks T1 and T2 generate always the same explanation, irrespective of the input to the network. (e) T3 generates a semi-random explanation based on the input. (f) T4 only generates a malicious explanation if a specific pattern (in this case, a smiley) is visible in the input. The area in the square for is enlarged for clarity.

Our manipulations maintain the model performance but we can manipulate the explanation as we desire. An overview of our proposed techniques T1-T4 are shown in Figure 1. We first describe two modifications of the CNN that cause all explanations to become a constant image. Arguably, this manipulation is easy to detect by inspecting the explanations, which is not as easy for the two more techniques that we propose. In one of our techniques the explanation is semi-random and depends on the input. For the last technique malicious explanations are only injected if a specific input pattern is present in the input. These last two techniques are much more difficult to detect using visual inspection of explanations and therefore pose a more serious security concern.

Several works use explanations to localize objects in images [2, 4, 14], which could be used by secondary systems; for example, as a pedestrian detector for a self-driving car or an explanations used by a doctor to find a tumor. Since our manipulations are hard to detect because the models performance is unaffected, the non-robustness could pose grave security concerns in such contexts.

Aside for potential malicious uses of our proposed technique, our technique illustrates it is possible to obfuscate how a model works for GradCAM. Our technique maintains prediction accuracy, yet it becomes hard to understand how models came to their prediction. Thus the model becomes impossible to interpret, while staying useful. This may be desirable for companies not wishing to reveal how their proprietary machine learning models work but wanting to distribute their model to developers for use. Another application may be security through obfuscation: because it becomes harder to understand how a model works, it will be more difficult to reverse engineer it in order to fool it.

2 GradCAM and Notation

We briefly review the notation and the GradCAM method [4]. We only consider CNNs for classification tasks. Let be the input image and the output before the final softmax (also referred to as the score). Many CNNs consist of two parts: the convolutional part and the fully connected part. GradCAM uses the featuremaps outputted by the last convolutional layer after the non-lineairity to generate the visual explanation. Here indicates the channel, and a single can be regarded as a 2D image. The visual explanation or heatmap for a class is computed by


Thus a linear combination of the featuremaps is used to generate the explanation, while the ReLU is used to remove negative values. is obtained by global-average-pooling the gradient for class with respect to the th featuremap,


where and are the indices for the pixels in the featuremap and is the total amount of pixels in the featuremap. Informally, if the th featuremap has a large influence on the score, as indicated by a large gradient, it must have been important in the decision and, thus, the larger the weight of the th featuremap in the linear combination.

3 Manipulating the CNN

We will show several techniques that manipulate the architecture and weights to change the explanation of GradCAM, while keeping the performance of the CNN (more or less) unchanged. The recipe for all these approaches will be the same. Step one: we add a filter to the last convolutional layer, so that there will be featuremaps. The th featuremap will contain our desired target explanation . We will scale in such a way that for all pixel locations , and channels . Step two: we change the architecture or weights of the fully connected part, to ensure for all and . Under these conditions, following Equation 1 and 2, the GradCAM explanation will be more or less equal to our desired target explanation, for all . Figure 1 gives an overview of the techniques T1-T4 which we will now discuss in more detail. We will use the subscript (old) to indicate parameters or activation values before manipulation and (new) indicates parameters or activations after manipulation of the model.

3.1 Technique 1: Constant Flat Explanation

For the first technique we change the model parameters such that the explanation becomes a constant heatmap irrespective of the input . Meanwhile, the scores of the model do not change, thus the accuracy stays the same.

We manipulate the network as follows. For the new th filter in the last convolutional layer, we set the parameters of the kernel to zero, and we set the bias to a large constant . This ensures for all irrespective of the input image and that for all . Let be the last featuremap in the convolutional part of the model. Each may have a different size , since after featuremap there can be pooling layers. We assume there are only max / average pooling layers between and , in that case . Let

be the vector obtained by flattening the last featuremaps

. We assume without loss of generality that is ordered as . Split in two parts: , such that and . Let be the weight matrix of the first fully connected layer and let be the output before the activation.

where is the old learnt bias. For the manipulated model

We set all entries in the matrix to a large value and we set , where is a vector of all-ones. Then , and thus the output is the same before and after manipulation. Because is large, small changes in lead to large changes in , thus is large. This ensures . Recall that however, is constant.

3.2 Technique 2: Constant Image Explanation

In the last technique, the target explanation was a constant. Now we describe the second manipulation technique that allows to be a fixed image of our choosing irrespective of the input image. We use the same technique as before, with two differences. First, we set the kernel parameters and the bias parameter of the ()th filter to zero. Before propagating to the next layer, we manipulate it: , where is the target explanation (image) of our choosing and is a large constant. This can be seen as a architectural change. We set all values in to a large value and we set , where (note is independent of ). Then again , and thus . The arguments of the previous technique still hold and thus we have and .

Figure 2: Illustration of architectural changes necessary for techniques T3 and T4. Dashed lines indicate modifications. ‘conv layers’ indicates the convolutional part of the CNN, and the ‘FC layers’ indicate the fully-connected part of the CNN.

3.3 Technique 3: Semi-Random Explanation

A limitation of the previous techniques is that the explanation is always the same irrespective of the input. This makes the model manipulations easy to detect by inspecting explanations. Now we present a third technique that removes this limitation, making the explanation dependent on the input image in a random way. Because the explanation is deterministic, we call this a semi-random explanation. Making the explanation dependent on the input however comes with a price: the scores may change a small amount of and more architectural changes to the model are required. The architectural changes are illustrated in Figure 2.

As before we will put our target explanation in . Again, we set all kernel and biases in the th convolutional filter to zero but now we also set and . To put the target explanation in , we set , where will be a neural network taking as input and outputs our desired target explanation . This can be seen as an architectural change in the form of a branch. We take to be a randomly initialized CNN (only the convolutional part). This way will make the explanations dependent on the input image and let them look more plausible, which will make the manipulation harder to detect.

To ensure large , we add a branch from to . is a vector of all ones. We set

is a scalar valued function taking a vector of length as input. We choose

where is the modulus operator ensures that for all . By choosing to be small, the difference between the scores will be small: . Furthermore, for all inputs we have . By choosing , we can make the gradient as large as desired, ensuring will be large for all classes .

(a) Input (no stickers)
(b) Original network (no stickers)
(c) T4 (no stickers)
(d) Input (stickers)
(e) Original network (stickers)
(f) T4 (stickers)
Figure 3: Illustration of Technique 4. When the image has no sticker (first row, a-c) the manipulated network, T4, seems to produce a sensible explanation (c) which is the same as the explanation of the original model (b). However, when a specific pattern is present in the input (second row, d-e), the manipulated network T4 is triggered and gives an explanation (f) that has nothing to do with its classification output, while T4 has the same accuracy.

3.4 Technique 4: Malicious Explanation Triggered by Input Pattern

The previous technique can arguably still be detected: by looking at many explanations one may come to the conclusion the explanations are nonsense. In this final example, we will only change the explanation if a specific pattern, a sticker, is observed in the input image . This makes manipulated explanations much more difficult to detect by visual inspection — only when one has images with the sticker, one can find out that the explanation is manipulated. A visual example is given in Figure 3.

We use exactly the same setup as in Technique 3, except that we change . For we use a neural network that outputs a constant zero image, unless a sticker is detected in the input. If stickers are detected, at the location of the sticker, the output of will be very large. Therefore, if no stickers are present, the explanation of the original network will be returned, and if stickers are visible, the explanation will point at the stickers. Generally, could be any function parametrized by a neural network, making it possible to trigger any kind of malicious explanation if a chosen (perhaps, more subtle) input pattern is visible.

4 Experimental Setup

For all experiments, we use the VGG-16 network [34]. As suggested in the GradCAM paper, we set to be the featuremap after the last convolutional layer (after activation, before pooling). For VGG-16, and the resolution of is

. We evaluate the original network and manipulated networks on the validation set of Imagenet of the ILSVRC2012 competition

[35]. We generate the heatmap for the class with the highest posterior. The heatmap always has positive values due to the ReLU operation. We normalize all heatmaps by the largest value in the heatmap to map it to : . We measure to what extent our manipulations are successful by measuring the distance between our target explanation and manipulated explanation in terms of the distance. For the experiments with the network T4, we evaluate on the original Imagenet validation set and the manipulated validation set. Manipulated images have 3 randomly placed smiley patterns.

For T1, set , . For T2, set and we set to a smiley image. For T3, choose , and . The network has a conv2d layer with 6 filters, with filtersize

, with 3 pixels zero padding at each side, with ReLU activation, followed by a second conv2d layer with 1 filter, kernel size

, 3 pixels zero padding at each side, with ReLU activation. All weights are randomly initialized. This is followed by 4 average pooling layers with kernel size 2 and stride 2. Then the output of

is and, thus, matches the size of for VGG-16. For T4 we use a network that has only one conv2d layer. The smiley pattern is binary: each pixel is white or black. The kernel parameters are set to the pixel values of the smiley image that is normalized to have zero mean, ensuring a maximum activation if the pattern occurs in the input image . We set the bias of the convolutional layer to where are the pixel values of the non-normalized smiley image. If the pattern is detected the output is , typically otherwise the output will be negative. We use a ReLU to suppress false detections, followed by 4 average pool layers with same size and stride as before, in order to get the output of the size and we set .

5 Results

The results for techniques T1-T3 are shown in Table 1, for qualitative results see Figure 1

. A minimal change in accuracy and scores is observed. After thorough investigation, we found that the change in score and accuracy for T1 and T2 is caused by rounding errors due to the limited precision used in our PyTorch implementation that uses

float16 values — theoretically, the networks should output the exact same scores and thus the accuracy should stay exactly the same. The distance between our desired target explanation and our observed manipulated explanation is quite small, which matches with the qualitative observation in Figure 1. Note that the change in score for T3 is lower than , as guaranteed.

The results for technique T4 are shown in Table 2, for a qualitative example see Figure 3. We observe a small drop in accuracy when the data is manipulated by stickers, as expected, but the accuracy for T4 and the original network are exactly the same. The change in score is very small. If there are no stickers, the target explanation is equal to the explanation of the original network. If there are stickers, is equal to the heatmap that detects the stickers. The observed explanation when a sticker is present is almost equal to the target explanation. Just as desired, if no sticker is present, the explanation of T4 remains the same as the explanation of the original network.

Original network 0.71592 - -
T1: constant 0.71594 0.01713 0.00513
T2: smiley 0.71594 0.00454 0.01079
T3: random 0.71592 0.00000 0.05932
Table 1: Evaluation of manipulated networks T1-T3 on the ILSVRC2012 validation set. Observe that the accuracy more or less stays the same. We measure the difference between the score of the original network and new manipulated score (the score is the output before softmax). The difference between the desired target explanation and the actual observed explanation is measured using the distance. The score changes very little while we can accurately manipulate the explanation as indicated by small distance.
Dataset Network Accuracy
Original Original 0.71592 - -
T4: backdoor 0.71592 0.00000 0.00000
Manipulated (sticker) Original 0.69048 - -
T4: backdoor 0.69048 0.00000 0.00006
Table 2: Evaluation of Technique 4 on the ILSVRC2012 validation set. Observe that T4 has the same accuracy and scores as the original network for both kinds of data. When presented with input data without stickers, the manipulated network T4 produces the same explanation as the original network. When presented with manipulated data, the manipulated explanation, , is almost equal to the desired explanation, .

6 Discussion

GradCAM is not ‘broken’ — for normally trained models, GradCAM has been proven to be useful. GradCAM does not work for adverserially manipulated models such as ours, since it was not designed for that task. However, our models are valid models, with (almost) equal performance. Hence, they should also admit a valid explanation. In fact, in [6] the axiom of Implementation Invariance is defined: two networks that produce the same output for all inputs should admit the same explanation. Clearly, GradCAM does not satisfy this axiom and thus there is room for improvement. One may wonder wether the axiom should be extended to models that return extremely similar predictions, such as T3 and T4.

Our work reveals that GradCAM relies on unknown assumptions on the network parameters, architecture, etc. It is difficult to rule out that, by accident, a model can be produced, using regular training, where GradCAM explanations may fail. We think it is important to determine what assumptions should be verified for GradCAM to produce accurate explanations, so we can always verify the correctness of GradCAM explanations.

Our techniques may be extended to fool other explanation methods. Several methods rely on the gradient [6, 2, 10, 12, 13]. T3 and T4 show that it is possible to manipulate the gradient, while affecting accuracy only little. So, these methods may also be vulnerable.

A weakness of our method is that architectural changes are necessary. If the practitioner visualizes the architecture (for example, using TensorBoard in TensorFlow


) or inspects the code, he may easily discover that the model has been tampered with. However, we believe similar attacks, where the original architecture is used, should be feasible, which would make the attack much harder to detect. We believe this is possible, since deep networks contain a lot of redundancy in the weights. Weights can be compressed or pruned, freeing up neurons, which then may be used to confuse the explanation. Recently, this area of research has been very active

[37, 38]. For example, Srinivas and Babu [39] were able to prune of the weights, while not significantly changing the test accuracy on MNIST. Another approach is Knowledge Distillation (KD), where a larger model (the teacher) can be compressed in a smaller model (the student) [40]. Such methods could be combined with our technique to keep the model accuracy more or less the same and to confuse the explanation method, without any architectural changes. We will explore this promising idea in future work.

7 Conclusion

We provided another sanity check in the same vein as Adebayo et al. [20] and we have shown that GradCAM does not satisfy said sanity check. We submit that, for any explanation method, one should consider whether it is possible to change the underlying model such that the predictions change minimally, while explanations change significantly. If this is the case, our work illustrates that the explanation method may be fooled by an attacker with access to the model and the explanations may not be as robust as desired.


  • Adadi and Berrada [2018] Amina Adadi and Mohammed Berrada.

    Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI).

    IEEE Access, 6:52138–52160, 2018.
  • Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In 2nd International Conference on Learning Representations (ICLR), Banff, AB, Canada, Workshop Track Proceedings, 2013.
  • Lundberg and Lee [2017] Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Proceedings of Advances in Neural Information Processing Systems 30 (NIPS), pages 4768–4777, 2017.
  • Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In

    Proceedings of the IEEE International Conference on Computer Vision

    , pages 618–626, 2017.
  • Kindermans et al. [2018] Pieter-Jan Kindermans, Kristof T Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, and Sven Dähne. Learning how to explain neural networks: PatternNet and PatternAttribution. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 2018.
  • Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, pages 3319–3328, 2017.
  • Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for Simplicity: The All Convolutional Net. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, Workshop Track Proceedings, 2014.
  • Zeiler and Fergus [2014] Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818–833, 2014.
  • Koh and Liang [2017] Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, pages 1885–1894, 2017.
  • Shrikumar et al. [2017] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning Important Features Through Propagating Activation Differences. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, pages 3145–3153, 2017.
  • Bach et al. [2015] Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek.

    On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.

    PloS one, 10(7):e0130140, 2015.
  • Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
  • Ribeiro et al. [2016] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, New York, NY, USA, 2016.
  • Amogh Gudi Nicolai van Rosmalen and van Gemert [2017] Marco Loog Amogh Gudi Nicolai van Rosmalen and Jan van Gemert. Object-Extent Pooling for Weakly Supervised Single-Shot Localization. In Proceedings of the British Machine Vision Conference (BMVC), 2017.
  • Fong and Vedaldi [2017] Ruth C. Fong and Andrea Vedaldi. Interpretable Explanations of Black Boxes by Meaningful Perturbation. Proceedings of the IEEE International Conference on Computer Vision, 2017-Octob:3449–3457, 2017.
  • Dabkowski and Gal [2017] Piotr Dabkowski and Yarin Gal. Real time image saliency for black box classifiers. In Proceedings of Advances in Neural Information Processing Systems 30 (NIPS), pages 6967–6976, 2017.
  • Zintgraf et al. [2017] Luisa M Zintgraf, Taco S Cohen, Tameem Adel, and Max Welling. Visualizing Deep Neural Network Decisions: Prediction Difference Analysis. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 2017.
  • Zhou et al. [2015] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Object Detectors Emerge in Deep Scene CNNs. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 2015.
  • Ancona et al. [2017] Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. A unified view of gradient-based attribution methods for Deep Neural Networks. In NIPS Workshop on Interpreting, Explaining and Visualizing Deep Learning, 2017.
  • Adebayo et al. [2018] Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. In Proceedings of Advances in Neural Information Processing Systems 31 (NIPS), pages 9505–9515, 2018.
  • Kindermans et al. [2017] Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (Un)reliability of saliency methods. arXiv preprint arXiv:1711.00867, 2017.
  • Ghorbani et al. [2017] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of Neural Networks is Fragile. arXiv preprint arXiv:1710.10547, 2017.
  • Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba.

    Learning Deep Features for Discriminative Localization.


    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , 2016.
  • Inc. [2019a] Google Inc. Google Cloud Machine Learning Engine, 2019a. URL
  • Inc. [2019b] Amazon Inc. Amazon SageMaker, 2019b. URL
  • Inc. [2019c] Microsoft Inc. Azure Machine Learning Service, 2019c. URL
  • Yao et al. [2018] Quanming Yao, Mengshuo Wang, Yuqiang Chen, Wenyuan Dai, Hu Yi-Qi, Li Yu-Feng, Tu Wei-Wei, Yang Qiang, and Yu Yang. Taking Human out of Learning Applications: A Survey on Automated Machine Learning. arXiv preprint arXiv:1810.13306, 2018.
  • Razavian et al. [2014] Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. CNN features off-the-shelf: An astounding baseline for recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, Columbus, OH, USA, pages 512–519, 2014.
  • Gu et al. [2017] Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. arXiv preprint arXiv:1708.06733, 2017.
  • Liu et al. [2018] Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. Trojaning attack on neural networks. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS), 2018.
  • Wang et al. [2019] Bolun Wang, Yao Yuanshun, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In 2019 IEEE Symposium on Security and Privacy (SP), 2019.
  • Chen et al. [2019] Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. Detecting backdoor attacks on deep neural networks by activation clustering. Workshop on Artificial Intelligence Safety 2019 co-located with the Thirty-Third AAAI Conference on Artificial Intelligence 2019 (AAAI-19), 2019.
  • Papernot et al. [2016] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the Science of Security and Privacy in Machine Learning. arXiv preprint arXiv:1611.03814, 2016.
  • Simonyan and Zisserman [2015] K Simonyan and A Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations, 2015.
  • Russakovsky et al. [2015] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, and others. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
  • Abadi et al. [2015] Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, 2015.
  • Cheng et al. [2018] Jian Cheng, Peisong Wang, Gang Li, Qinghao Hu, and Hanqing Lu. Recent Advances in Efficient Computation of Deep Convolutional Neural Networks. Frontiers of Information Technology & Electronic Engineering, 19(1):64–77, 2018.
  • Cheng et al. [2017] Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. A Survey of Model Compression and Acceleration for Deep Neural Networks. arXiv preprint arXiv:1710.09282, 2017.
  • Srinivas and Babu [2015] Suraj Srinivas and R Venkatesh Babu. Data-free Parameter Pruning for Deep Neural Networks. In Xianghua Xie Mark W. Jones and Gary K L Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 1–31, 9 2015.
  • Ba and Caruana [2014] Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? In Proceedings of Advances in Neural Information Processing Systems 27 (NIPS), pages 2654–2662, 2014.