I Introduction
Machine Learning classifiers are used for various purposes in a variety of research areas ranging from robotics to health. However, they have been shown to be vulnerable to a number of different attacks.
[24, 4, 39, 9]. Adversarial Examplespresent the most direct threat to Machine Learning classification at testtime: by introducing an almost imperceptible perturbation to a correctly classified sample, an attacker is able to change its predicted class. Adversarial examples have been used to craft visually indistinguishable images that are missclassified by stateoftheart computer vision models
[28] and they enable malware to bypass classifierbased detection mechanisms without loss of functionality [37, 43, 12].While a range of defenses against these attacks has been developed, they mostly provide an empirical mitigation against adversarial examples [44]. This is not surprising: the development of new methods in deep learning is primarily motivated by the need for tractable models, favoring flexibility and efficiency over a rigorous mathematical framework.
The study of interpretability, expressivity and learning dynamics of DNN is an active area of research [32, 36].
Nevertheless, the lack of a rigorous theoretical underpinning in DNN has been detrimental to many defensive mechanisms: their robustness guarantees were primarily supported by empirical observations, often omitting implicit assumptions that were subsequently successfully subverted by increasingly elaborate attacks [7]. Recent work has started addressing this developing arms race of attacks and defenses by amending the lack of provable guarantees in the formal framework of DNN by auxiliary methods, e.g. in the form of verification techniques [17, 15]. Other approaches use the theoretical framework of more formally rigorous Machine Learning models, e.g. kernel methods [14] or kNearest Neighbor [42], to provide meaningful security and robustness guarantees.
Using statistical methods, [3] and [11]
show that the distributions of benign data and adversarial data differ. Harnessing the comprehensive framework of Bayesian probability, relating high uncertainty for predictions to the sample being differently distributed than benign data, presents an immediate next step. Efforts to leverage Bayesian uncertainty estimates in conjunction with DNN to discern adversarial perturbations have been made
[5, 21]. More generally, when projecting DNN into the framework of Bayesian methods, the seminal work of [29] notes a direct correspondence between infinite DNN and Gaussian Processes. [20] extends this work by describing a direct correspondence between deep and wide neural networks and Gaussian Processes, and by showing that Gaussian Process uncertainty is strongly correlated with DNN predictive error.Contributions
In this paper, we investigate adversarial examples in a Bayesian framework using Gaussian Processes. In particular, we focus on uncertainty estimates in Gaussian Process Classification (GPC) and the Gaussian Process Latent Variable Model (GPLVM). Motivated by the fact that some attacks exploit the unstable attack surface of DNN specifically [5] , we also formally derive whitebox attacks on GPC and GPLVM.
Our evaluation across four tasks shows that uncertainty estimates usually reflect adversarial perturbations caused by stateoftheart techniques. However, the connection between the change in uncertainty and the amount of perturbation introduced is not straightforward and warrants further investigation beyond the scope of this paper. A first mitigation based only on thresholding uncertainty estimates and rejecting predictions below this threshold already shows promising initial results. Intriguingly, we observed a possible link between the norm used in the kernel and the vulnerability towards an attack based on the dual of this norm: based attacks appeared more successful in thwarting detection on our models with RBF kernels. Attacks crafted on the same algorithm, but using other metrics, were less successful and significantly affect uncertainty estimates. They therefore were rejected across all tested variants.
Ii Background
In this section, we briefly review Machine Learning classification, and Adversarial Examples before providing a introduction to Gaussian Process Classification (GPC) and Gaussian Process Latent Variable Model (GPLVM) based classification.
Iia Classification
In classification, we consider a dataset , where are the data points and are the labels. The goal is to train a classifier by adapting the parameters based on the training data such that , i.e. correctly predicts the label of before unseen test data
. For example an SVM computes the optimal hyperplane given some data, where a nonlinear decision boundary is achieved by using a kernel. In contrast, a DNN learns several mappings, one in each layer, and is thus optimized to separate the data.
IiB Adversarial Examples
Given a trained classifier , testtime attacks compute a small perturbation for a test sample such that
(1) 
i.e., the sample is classified as a different class than the original input. The sample is then called an adversarial example. A more advanced attacker can also make targeted attacks, i.e. select the specific target class the sample should be misclassified as. Since we only consider binary settings in this paper, this distinction is superfluous.
Many algorithms exist for creating adversarial examples. We focus on the Fast Gradient Sign Method (FGSM) by [9] and the Jacobianbased Saliency Map Approach (JSMA) by [31], both of which are based on the derivative of the DNN’s output with respect to its inputs. We also consider the attacks introduced by [6], which treat the task of producing an adversarial exmaple as an iterative optimization problem.
IiC Gaussian Processes
This paper focuses on Gaussian Processes(GP), as they provide principled uncertainty estimates. We first introduce the Gaussian Process Latent Variables Model (GPLVM), a probabilistic model yielding a latent space representation for data irrespective of the labels. Afterwards, we consider a GP variant that incorporates labels during training and introduce GP Classification (GPC) using the Laplace approximation.
IiD Gaussian Process Latent Variable Model
A Gaussian Process Latent Variable Model (GPLVM) [18] yields a nonlinear latent space representation, , for some input data . In particular, GPLVM learns this mapping by maximizing the likelihood for the latent positions.
To understand GPLVM, it is useful to first consider Principal Component Analysis (PCA). In PCA, we aim to reduce dimensionality by assuming that the data lies on a manifold described by the eigenvectors associated with the greatest variance. The dimensions of this lowerdimensional, nonlinear mapping are expressed by latent variables.
By giving the values of the latent variables, , a Gaussian Prior and integrating over them, we obtain probabilistic PCA,
(2) 
where and are the parameters and denotes the latent dimension. and can be obtained by using maximum likelihood estimates. Alternatively, putting a prior on and integrating over it yields dual probabilistic PCA:
(3) 
The inner product can be kernelized. For example, using a nonlinear kernel (such as the RBF) yields GPLVM. Using a nonlinear kernel, however, also results in a nonclosed solution. Note that GPLVM is not itself a classifier. In order to use it for classification tasks, we therefore apply an SVM to the latent variables.
IiE Gaussian Process Classification
We introduce GPC [33] for two classes using the Laplace approximation. The goal is to predict the labels for the test data points accurately. We first consider regression, and assume that the data is produced by a GP and can be represented using a covariance function :
(4) 
where is the covariance of the training data, of the test data, and between test and training data. Having represented the data, we now review how to use this representation for predictions. The optimum estimate for the posterior mean at given test points, assuming a Gaussian likelihood function is
(5) 
which is also the mean of our latent function . We will not detail the procedure for optimizing the parameters of the covariance function . The above derivation is for a regression model, we can alter this to perform classification. Since our labels are not real valued, but class labels, we ‘squash’ this output using a link function such that the output varies only between the two classes; hence the optimization can be simplified using the previously stated Laplace approximation. At this point, we want to refer the interested readers to [33].
In addition to the mean prediction, GPs also provide the variance. This allows us to obtain the uncertainty for GPC, and will be used later in this work.
Iii Methodology
To investigate the effect of adversarial examples on uncertainty, we extend both JSMA and FGSM to a broader setting. This includes a direct computation of such examples on GPC. Further, we adapt common DNN to approximate latent space representations.
Iiia Attacks on GPC
To produce an adversarial example for GPC we compute the gradient in the output with respect to the input dimensions. We consider the chain of gradients for the output , and input where and are the associated latent function and covariance function, respectively. To start, we rewrite the expected value of in Equation 5 given a single test point :
(6) 
From here, we move on to the first part of the gradient,
(7) 
as the remaining terms are both constant with respect to the test input . The gradient of the covariance with respect to the inputs depends on the kernel, in our case, for the RBF kernel, between training point and test point . The gradient can be expressed as
(8) 
where and each denote feature or dimension
of the corresponding vector or data point and
denotes the lengthscale parameter of the kernel. The gradient of the output with respect to the inputs is approximately proportional to the product of Equation 7 and Equation 8. A more nuanced reasoning and restrictions of this approach can be found in the Appendix.Based on the computation of these gradients, we can perturb the initial sample. In GPFGS (Algorithm 1), we introduce a global change using the sign of the gradient and a specified . Alternatively, we compute local changes (see Algorithm 2). In this algorithm we iteratively compute the (still unperturbed) feature with the strongest gradient and perturb it. We finish altering the example when it is either misclassified or we have changed more than a previously specified number of features, corresponding to a fail.
Finally, the computation of the inverse matrix in Equation 7 might be impossible due to sparseness of the features. In cases of such sparse data, we approximate the inverse by using a Pseudoinverse.
IiiB Attacks on GPLVM
We propose a complementary approach to the attacks on GPC by attacking GPLVM+SVM using (the already established methodology of) DNN surrogates combined with JSMA and FGSM and extend it to DNN surrogates for the GPLVM model. To train the surrogate model, we train a DNN to fit the latent space representations in one of the hidden layers. We achieve this by taking a common DNN and splitting it into two parts, where a hidden layer becomes the output layer for the first part and the input layer for the second part (see lower half of Figure 1).
The first part is trained using the normal training data as input. We train it minimizing the loss between the output of the network and the latent space we want to approximate (for example the output of GPLVM). The second part receives this latent space as input, and is trained minimizing the loss of the normal labels. When stacking these two networks (i.e., when feeding the output of the first part immediately into the second), we obtain a combined DNN that mimics both the latent space it was trained on and the classifier on this latent space.
Iv Experimental Setup
In this section, we briefly describe the datasets and models we use.^{1}^{1}1The source code is available on request. We provide more details in the Appendix. Afterwards, to conclude this section, we give a brief outline of the experiments we conducted.
Iva Data
Adversarial examples are most important in security and safety contexts. Previous work [12] indicates that settings such as malware detection do not necessarily respond to the adversarial attacks in the same way as computer vision problems. Note that we focus on binary classification problems, as many securityrelevant learning tasks heavily emphasize binary decisions, most notably between benign and malicious samples. We therefore select two learning tasks in which adversarial could be used to great effect without an elaborate setup on the attackers’ side: malware detection [38] and spam detection [22], both of which feature the classical security dichotomy of benign and malicious samples.
The malware dataset (MAL) contains 439,563 samples (92.5% benign) represented by 1223 binary features. The spam dataset (SPAM) contains 4,601 samples, of which 60% are benign. Each sample consists of 54 realvalued and three binary features. In addition to these securityfocused datasets we also pick two binary subtasks from the MNIST dataset
[19], namely 3 vs. 8 (MNIST38) and 1 vs. 9 (MNIST91). We select these settings in an effort to evaluate our results on a broad range of real, mixed and binaryvalued features, as well as balanced and imbalanced datasets.IvB Models
We evaluate a range of Machine Learning models in this paper. We trained DNNs and SVMs with both linear and RBF kernels. Further, GPC and GPLVM classifiers (using a SVM classifier on the latent space), both featuring an RBF kernel. Finally, we trained DNNs to mimic both the latent space of a linear SVM (dubbed linDNN) and a GPLVM classifer (GPDNN). The following models were found not to reach our performance threshold and were excluded from the study: RBF SVM on the SPAM data, and linDNN on MNIST38, MAL and SPAM. However, we used linDNN to craft adversarial examples on SPAM for experimental purposes. The classifier accuracies on test data are depicted in Table II. For ease of reference we give a list of all abbreviations used for models and attacks in Table I for the evaluation.
Name  Description 

GPC  Gaussian Process Classification 
GPLVM  Gaussian Process Latent Variable Model 
SVM  Support Vector Machine 
DNN  Deep Neural Network 
linDNN  DNN trained to mimic a linear kernel in a hidden layer 
GPDNN  DNN trained to mimic GPLVM in a hidden layer 
JBM  Jacobian based attacks: JSMA, GPJM 
FGSM, GPFGS or lin SVM attack with perturbation  
Carlini and Wagners Attacks with norm 
IvC Outline of Experiments
Uncertainty
Our main interest here is the effect adversarial examples have on uncertainty. We will test all crafted examples on the previously named models on GP without investigating whether they are actually cause misclassification. We expect adversarial examples to have a different distribution (as shown empirically in [11]) to benign data and hence to lie further from the training data than benign test points. When using the stationary RBF kernel (as in GP here), the variance of a prediction is lower in areas where training data was observed. Thus we put forward the hypothesis that malicious data points induce a higher latent variance in GP than benign samples. For GPC, we also investigate the average of the absolute mean of the latent function. This analysis is not applicable to GPLVM, since the GPLVM latent mean is interpreted as a position in latent space (as explained in Section IIC). Hence, for GPLVM, we only measure and evaluate the latent variance.
We further make use of an uncertainty threshold to reject adversarial examples on GPC and present these results. Note that we only investigate this as a first step, and do not optimize this approach beyond a straight forward % interval. More research will be needed to solve the difficulties posed by adaptive attackers in realworld scenarios. We expect this defense to detect some adversarial examples, and are in particular interested in those cases that successfully thwart detection.
Transferability
Observing changes in uncertainty without additionally surveying the perturbation introduced by attacks, or without considering the amount of crafted examples that fail to cause misclassification, might be misleading. We thus focus on the question whether a stronger perturbation leads to stronger changes in uncertainty. Further, we investigate to which degree GP based methods are susceptible to adversarial examples. A low change in uncertainty might be a consequence of the adversarial examples being correctly classified despite the perturbation introduced by the attack.
MNIST38  MNIST91  MAL  SPAM  

DNN  
linDNN  X  
GPDNN  
GPLVM  
GPC  
lin SVM  
RBF SVM 
V Evaluation
The principal question we are interested in is how adversarial examples affect the uncertainty measure in GP methods. We investigate these changes for all computed examples, and ignore for now whether they actually cause misclassification. The question whether adversarial examples are actually effective (i.e. are misclassified) will be addressed afterwards.
Uncertainty in GPC
Figure 2 shows the effect of attempted adversarial examples compared to benign data on GPC uncertainty estimates. The different type of attacks have different degrees of impact on the latent absolute mean. Sometimes larger degree of perturbation, indicated by in attacks such as FGSM, GPFGS or the linear SVM attack, induce higher change in the mean. The uncertainty also changes for many Jacobian based methods. We observe further that GPDNN on MAL and DNN on SPAM lead to almost no changes. We include in the appendix additional results investigating changes in the average variance of the latent function.
Basic Mitigation for GPC
As a next step from these results, we investigate a straightforward mitigation: We consider the distribution of estimated variances for all the benign test data provided to the GPC. We compute the interval over this distribution and then reject test points that are outside this interval, as we hypothesise that the variance of adversarial examples will differ from benign data. We also apply the same procedure to the latent mean, for similar reasons. We present our results in Figure 3. We observe this simple step to be quite successful on the Spam data (except on the attacks). On MNIST91 we observe mixed results. On MNIST38 and the Malware data the approach does not work well.
Uncertainty in GPLVM
Similar to the previous experiment, we measure the variances of GPLVM for all kinds of attempted adversarial examples in Figure 4. For both MNIST tasks, we observe changes of or in the variance, if there are changes at all. For the Malware and Spam data, we do observe some changes: On the Malware data, the mean variance shifts from to or . On the Spam data, the mean variance is an order of magnitude less.
Va White Box Setting
In the previous section, we observed that Carlini and Wagner’s attacks, as well as Jacobian based methods (on MNIST38 and MAL), only lead to a small response in the uncertainty estimates. We plot the introduced perturbations in Table III and find that indeed, these settings yield low perturbations and change only around one feature. However, the adversarial examples crafted with JSMA on linDNN for MNIST91 have the highest perturbation at features on average. Strangely though, the change in uncertainty estimates is less than for the other Jacobian based attacks, which needed fewer perturbations. We thus conclude that the relationship between size of perturbation and effect on uncertainty is nontrivial.
M38  M91  MAL  Spam  

JBM  JBM  JBM  JBM  
GPC        
GPDNN  
linDNN  X  X  X  X  
DNN 
VB Transferability
We observed that the uncertainty estimates did not change noticeably for MNIST38, MAL, Carlini and Wagner’s attack and small values of . A natural reason for the uncertainty to remain low is because classification is still correct, e.g. the examples are actaully not adversarial. We report the percentage of correctly classified examples for GPCs in Figure 4(a) and for GPLVM with an SVM on top in Figure 4(b). To enable a comparison, we further plot the same percentages for a normal DNN Figure 5(a) and the individual SVM used on top of GPLVM without latent space in Figure 5(b). Full results can be found in the Appendix.
The first observation is that for GPC, GPLVM+SVM and DNN, the accuracy on all (adversarial) examples on the MAL dataset is still very high. For MNIST38, however, where we did not observe changes in uncertainty estimates, and many examples are misclassified or adversarial. Therefore there exist adversarial examples which remain undetected. A interesting finding is that, for all the GPbased classifiers used in this study, the most effective attack was Carlini and Wagners’ (with the norm). In particular, the most effective attacks were produced when the examples were crafted against or on GPDNN. For low values of (,), we observe that many (%, on SPAM %) examples are not adversarial. Finally, we found that classification using GPLVM + SVM is more robust than SVM classification on its own.
VC Conclusion of experiments
We observed many adversarial examples to have an influence on uncertainty in GP based methods. The detection of changes in the estimated uncertainty, and low transferability to GP based methods yield mostly robust methods in three of four cases studied. Future work will investigate more parameters, and whether alternative covariance functions or lengthscales can be used to increase robustness. One observation in particular needs to be investigated: We observe all Carlini and Wagner attacks based on the norm to remain effective and hard to detect even in the presence of uncertainty.
Since the RBF kernel of the Gaussian Process is based on the norm, future work needs to determine whether selecting a kernel with a different norm will also alter the classifier’s vulnerability to this attack. A similar connection between the classifiers metric in regularization and its vulnerability to an attack with a dual metric has already been established for linear models [35]. We therefore consider this as a promising direction for future research.
Vi Related Work
To the best of our knowledge, only [5] and [27] investigate uncertainty in the presence of adversarial examples. The latter approach adds a 1class SVM as a last layer of a DNN to build a defense based on uncertainty. They show that this defense can be circumvented, however. The first paper is more closely related to our work: the authors investigate socalled Gaussian Hybrid networks, a DNN where the last layer is replaced by a Gaussian Process. They evaluate the robustness of their approach only on FGSM and the attack by Carlini and Wagner. In contrast, our work targets GPLVM and GPC directly and investigates the sensitivity of Bayesian uncertainty estimates regarding the perturbation caused by adversarial examples in general.
Another field of research is the general relationship between Deep Learning and Gaussian Processes [29]. To gain more understanding, recent approaches represent DNN with infinite layers as kernel for Gaussian Processes [8, 20]. Lee et al. further show a relation between uncertainty in Gaussian Processes and predictive error in DNN, a result that links our work with other approaches targeting DNN.
At the same time, other Machine Learning models also admit Bayesian Inference to model predictive uncertainty. [21]
show that uncertainty estimates in Bayesian Neural Networks, i.e. Neural Networks with a prior probability placed over their weights, can be used to tell apart adversarial and benign images.
Transferability has been investigated in the context of adversarial examples has been brought up by [30].[34] study transferability for different deep neural network architectures, whereas [23] specifically investigate targeted transferability. Finally, [40] explore transferability in general by examining the decision boundaries of different classifiers. In contrast to these works, we specifically investigate transferability in the context of Gaussian Process models, namely GPC and GPLVM. Further, we focus on the effects of adversarial examples on uncertainty measures that are inherent to these models.
Vii Conclusion
We have investigated adversarial examples and their impact on uncertainty estimates in a Bayesian framework using Gaussian Processes. Our study was based on two types of attacks: First, stateoftheart attacks that were computed on the same dataset but using nonGaussian Process surrogate models, relying on the transferability property of adversarial examples. Second, as set of whitebox attacks we formally derived to specifically target Gaussian Process based classifiers.
In general, we found that the perturbation introduced as part of the crafting process is reflected in Gaussian Process uncertainty estimates. Interestingly, we also found that some models remain vulnerable when targeted by attacks using the dual of the target’s kernel norm as an optimization metric. This observation is in line with similar observations already made for regularization in linear methods.
Acknowledgment
This work was supported by the German Federal Ministry of Education and Research (BMBF) through funding for the Center for ITSecurity, Privacy and Accountability (CISPA) (FKZ: 16KIS0753). This work has further been supported by the Engineering and Physical Research Council (EPSRC) Research Project EP/N014162/1.
References
 [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. A. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: A system for largescale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 24, 2016., pages 265–283, 2016.
 [2] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto, and F. Roli. Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases  European Conference, ECML PKDD 2013, Prague, Czech Republic, September 2327, 2013, Proceedings, Part III, pages 387–402, 2013.
 [3] B. Biggio, G. Fumera, G. L. Marcialis, and F. Roli. Statistical metaanalysis of presentation attacks for secure multibiometric systems. IEEE Trans. Pattern Anal. Mach. Intell., 39(3):561–575, 2017.

[4]
B. Biggio, G. Fumera, F. Roli, and L. Didaci.
Poisoning adaptive biometric systems.
In
Structural, Syntactic, and Statistical Pattern Recognition  Joint IAPR International Workshop, SSPR&SPR 2012, Hiroshima, Japan, November 79, 2012. Proceedings
, pages 417–425, 2012.  [5] J. Bradshaw, A. G. d. G. Matthews, and Z. Ghahramani. Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks. ArXiv eprints, July 2017.
 [6] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644, 2016.
 [7] N. Carlini and D. A. Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. CoRR, abs/1705.07263, 2017.
 [8] A. G. de G. Matthews, J. Hron, M. Rowland, R. E. Turner, and Z. Ghahramani. Gaussian process behaviour in wide deep neural networks. International Conference on Learning Representations, 2018.
 [9] I. J. Goodfellow et al. Explaining and harnessing adversarial examples. In Proceedings of the 2015 International Conference on Learning Representations, 2015.
 [10] I. J. Goodfellow, N. Papernot, and P. D. McDaniel. cleverhans v0.1: an adversarial machine learning library. CoRR, abs/1610.00768, 2016.
 [11] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel. On the (Statistical) Detection of Adversarial Examples. ArXiv eprints, Feb. 2017.
 [12] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. D. McDaniel. Adversarial examples for malware detection. In Computer Security  ESORICS 2017  22nd European Symposium on Research in Computer Security, Oslo, Norway, September 1115, 2017, Proceedings, Part II, pages 62–79, 2017.
 [13] Y. Han and B. I. P. Rubinstein. Adequacy of the GradientDescent Method for Classifier Evasion Attacks. ArXiv eprints, Apr. 2017.
 [14] M. Hein and M. Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. CoRR, abs/1705.08475, 2017.
 [15] X. Huang, M. Kwiatkowska, S. Wang, and M. Wu. Safety verification of deep neural networks. In International Conference on Computer Aided Verification, pages 3–29. Springer, 2017.

[16]
E. Jones, T. Oliphant, P. Peterson, et al.
SciPy: Open source scientific tools for Python, 2001–.
 [17] G. Katz, C. W. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. Reluplex: An efficient SMT solver for verifying deep neural networks. CoRR, abs/1702.01135, 2017.

[18]
N. D. Lawrence.
Gaussian process latent variable models for visualisation of high dimensional data.
In Advances in neural information processing systems, pages 329–336, 2004.  [19] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. In Proceedings of the IEEE, pages 2278–2324, 1998.
 [20] J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pennington, and J. SohlDickstein. Deep neural networks as gaussian processes. arXiv preprint arXiv:1711.00165, 2017.
 [21] Y. Li and Y. Gal. Dropout inference in bayesian neural networks with alphadivergences. CoRR, abs/1703.02914, 2017.
 [22] M. Lichman. UCI machine learning repository, 2013.
 [23] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and blackbox attacks. CoRR, abs/1611.02770, 2016.
 [24] D. Lowd and C. Meek. Good word attacks on statistical spam filters. In CEAS 2005  Second Conference on Email and AntiSpam, July 2122, 2005, Stanford University, California, USA, 2005.
 [25] D. Maiorca, I. Corona, and G. Giacinto. Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious PDF files detection. In 8th ACM Symposium on Information, Computer and Communications Security, ASIA CCS ’13, Hangzhou, China  May 08  10, 2013, pages 119–130, 2013.

[26]
M. McCoyd and D. Wagner.
Spoofing 2D Face Detection: Machines See People Who Aren’t There.
ArXiv eprints, Aug. 2016.  [27] M. Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, and F. Roli. Is deep learning safe for robot vision? adversarial examples against the icub humanoid. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 2229, 2017, pages 751–759, 2017.
 [28] S.M. MoosaviDezfooli, A. Fawzi, and P. Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
 [29] R. M. Neal. Bayesian learning for neural networks, volume 118. Springer, 1996.
 [30] N. Papernot, P. McDaniel, and I. J. Goodfellow. Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. CoRR, abs/1605.07277, 2016.
 [31] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The Limitations of Deep Learning in Adversarial Settings. In Proceedings of the 1st IEEE European Symposium in Security and Privacy (EuroS&P), 2016.
 [32] M. Raghu, B. Poole, J. M. Kleinberg, S. Ganguli, and J. SohlDickstein. On the expressive power of deep neural networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 611 August 2017, pages 2847–2854, 2017.
 [33] C. E. Rasmussen and C. K. I. Williams. Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, 2006.
 [34] A. Rozsa, M. Günther, and T. E. Boult. Are Accuracy and Robustness Correlated? ArXiv eprints, Oct. 2016.
 [35] P. Russu, A. Demontis, B. Biggio, G. Fumera, and F. Roli. Secure kernel machines against evasion attacks. In AISec@CCS, pages 59–69. ACM, 2016.
 [36] S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. SohlDickstein. Deep information propagation. arXiv preprint arXiv:1611.01232, 2016.
 [37] N. Srndic and P. Laskov. Practical evasion of a learningbased classifier: A case study. In 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 1821, 2014, pages 197–211, 2014.
 [38] N. Šrndić and P. Laskov. Hidost: a static machinelearningbased detector of malicious files. EURASIP Journal on Information Security, 2016(1):22, Sep 2016.
 [39] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
 [40] F. Tramèr, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. The Space of Transferable Adversarial Examples. ArXiv eprints, Apr. 2017.
 [41] P. Vidnerová and R. Neruda. Vulnerability of machine learning models to adversarial examples. In Proceedings of the 16th ITAT Conference Information Technologies  Applications and Theory, Tatranské Matliare, Slovakia, September 1519, 2016., pages 187–194, 2016.
 [42] Y. Wang, S. Jha, and K. Chaudhuri. Analyzing the robustness of nearest neighbors to adversarial examples. CoRR, abs/1706.03922, 2017.
 [43] W. Xu, Y. Qi, and D. Evans. Automatically evading classifiers. In Proceedings of the 2016 Network and Distributed Systems Symposium, 2016.
 [44] X. Yuan, P. He, Q. Zhu, R. R. Bhat, and X. Li. Adversarial examples: Attacks and defenses for deep learning. arXiv preprint arXiv:1712.07107, 2017.
Appendix A Details on Attack Derivation for GPC
In this part of the Appendix, we present the detailed derivation to compute adversarial examples on GPC, including the reasoning why it is sufficient to use the latent mean.
We compute the gradient in the output with respect to the input dimensions. We consider the chain of gradients for the output , and input :
(9) 
where and are the associated latent function and covariance function, respectively.
Note that for this attack, we are only interested in the relative order of the gradients, not their actual values. Unfortunately, does not vary monotonically with as the variance also affects the prediction. However, we are in a setting of binary classification, so we are only interested in moving the prediction, , across the boundary. No change in variance can cause this, instead a change in the mean of is required (effectively the mean is monotonic with respect to in the region of 0.5). The fastest we can get
from one probability threshold
to its opposite is when there is no variance (any variance will move the mean towards 0.5). So finding the gradient of is sufficient.However, we found that we can still use the gradient of (instead of a numerical approximation to ):
(10) 
Let us first rewrite the expected value of given a single test point :
(11) 
From here, we move on to the first part of the gradient,
(12) 
note the remaining terms are both constant with respect to the test input . The gradient of the covariance with respect to the inputs depends on the particular kernel that is applied. In our case, for the RBF kernel, between training point and test point , the gradient can be expressed as
(13) 
where and each denote feature or dimension of the corresponding vector or data point and denotes the lengthscale parameter of the kernel. Using Equation 10 the gradient of the output with respect to the inputs is approximately proportional to the product of Equation 12 and Equation 13, in the region of 0.5.
Appendix B Details of Experimental Setup
In this Appendix we provide more detailed information about the used datasets and the parameters of the models.
Ba Datasets
In the following, we describe the datasets in detail that were used for the evaluation.
Mal
Our Malware dataset consists of the PDF Malware data of the Hidost Toolset project [38] The dataset is composed of PDF Malware samples, of which are labeled as benign and as malicious. Datapoints consist of binary features and individual feature vectors are likely to be sparse. We split it in % training and %. This still leaves us with more than test data points to craft adversarial examples, where many attacks are very time consuming to compute.
Spam
The second securityrelevant dataset is an email Spam dataset [22]. It contains samples. Each sample captures features, of which are continuous and represent word frequencies or character frequencies. The three remaining integer features contain capital run length information. This dataset is slightly imbalanced: roughly % of the samples are classified as Spam, the remainder as benign emails. We split this dataset randomly and use % as test data.
Mnist
Finally, we use the MNIST benchmark dataset [19] to select two additional, binary task subdatasets. It consist of roughly , pixels, black and white images of handwritten single digits. There are training and test samples, for each of the ten classes roughly the same number. We select two binary tasks: versus and versus (denoted as MNIST91 and MNIST38 respectively). We do this in an effort to study two different tasks on the same underlying data representation, i.e. the same number and range of features, yet with different distributions to learn.
BB Models
We investigate transferability across multiple ML models derived by different algorithms. In some cases, datasetspecific requirements have to be met for classification to succeed.
Gplvm
We train GPLVM generally using latent dimensions with the exception of the Spam dataset, where more dimensions () are needed for good performance. We further use SVM on top of GPLVM to produce the classification results, a linear SVM for the MNIST91 tasks and an RBFkernel SVM for all other tasks.
DNN on latent space
We distinguish between DNN approximating GPLVM (GPDNN), linear SVM (linDNN) and RBF SVM (rbfDNN). All of them contain two hidden layers with half as many neurons as the datasets’ respective features. The layer trained on latent space encompasses
neurons for the SVM networks and neurons for GPDNN, except for the Spam dataset, where we model latent variables. We train the latent space part of the network with squared loss; the classifying part is trained as other networks using cross entropy loss. From this latent space, we train a single layer for classification, with the exception in GPDNN in the cases where an RBFSVM is trained on top: here we add a hidden layer of neurons.Dnn
Our simple DNN accomodates two hidden layers, each containing half as many neurons as the dataset has features, and ReLU activation functions.
Svm
We study a linear SVM and a SVM with an RBF kernel. They are optimized using squared hinge loss. We further set the penalty term to . For the RBF kernel, the parameter is set to 1 divided by the number of features.
BC Implementation and third party libraries
We implement our experiments in Python using the following specialized libraries: Tensorflow [1] for DNNs, Scipy [16] for SVM and GPy [18] for GPLVM and GPC. We rely on the implementation of the JSMA and FGSM attack from the library Cleverhans version 1.0.0 [10]. We use the code provided by Carlini and Wagner for their attacks^{2}^{2}2Retrieved from https://github.com/carlini/nn_robust_attacks, July 2017.. We implement the linear SVM attack (introduced in [30]) and the GPattacks (based on GPy) ourselves.
Appendix C Full results of experiments
In the main paper, we present selected results to back up our reasoning. We present the full results in this Appendix, so that individual results can be confirmed. further, this enables looking up results that are not presented in the main paper.
The full results for uncertainty for GPC are in Table IV (latent mean), Table V (latent variance), and Table VI (mitigation). We present the full results on GPLVM uncertainty in Table VII.
Concerning the WhiteBox experiments, Table VIII shows the perturbations for methods based on the Jacobian and Carlini and Wagner Attack. Further Table IX shows the accuracy for FGSM, linear SVM attack and GPFGS for different models.
Finally, we present the full results on our transferability experiments, ordered by datasets. Table X was done on MNIST38, Table XI on MNIST91, Table XII on MAL and Table XIII on SPAM.
FGSM / linSVM / GPFGS  CW  

ORIGIN  JBM  
MNIST38  
GPC  
GPDNN  
lin SVM  
DNN  
MALW  
GPC  
GPDNN  
lin SVM  
DNN  
MNIST91  
GPC  
GPDNN  
lin SVM  
linDNN  
DNN  
SPAM  
GPC  
GPDNN  
lin SVM  
linDNN  
DNN 
FGSM / linSVM / GPFGS  CW  

origin  JBM  
MNIST38  
GPC  
GPDNN  
lin SVM  
DNN  
MALW  
GPC  
GPDNN  
lin SVM  
DNN  
MNIST91  
GPC  
GPDNN  
lin SVM  
linDNN  
DNN  
SPAM  
GPC  
GPDNN  
lin SVM  
linDNN  
DNN 
FGSM / linSVM / GPFGS  CW  

ORIGIN  JBM  
MNIST38  
GPC  
GPDNN  
lin SVM  
DNN  
MALW  
GPC  
GPDNN  
lin SVM  
DNN  
MNIST91  
GPC  
GPDNN  
lin SVM  
linDNN  
DNN  
SPAM  
GPC  
GPDNN  
lin SVM  
linDNN  
DNN 
FGSM / linSVM / GPFGS  CW  

ORIGIN  JBM  
SPAM  
GPC  
GPDNN  
lin SVM  
linDNN  
DNN  
MNIST91  
GPC  
GPDNN  
lin SVM  
linDNN  
DNN  
MALW  
GPC  
GPDNN  
lin SVM  
DNN  
MNIST38  
GPC  
GPDNN  
lin SVM  
DNN 
MNIST38  MNIST91  MAL  SPAM  

GPC  
GPDNN  
linDNN  X  X  
DNN 
Dataset  

M.38  GPC  
GPDNN  
lin SVM  
DNN  
M.91  GPC  
GPDNN  
lin SVM  
linDNN  
DNN  
MAL  GPC  
GPDNN  
lin SVM  
DNN  
SPAM  GPC  
GPDNN  
lin SVM  
linDNN  
DNN 
MNIST38  FGSM / linSVM / GPFGS  CW  

origin  target  JBM  
GPC  
GPDNN  
GPLVM  
GPC  lin SVM  
RBF SVM  
DNN  
GPC  
GPDNN  
GPLVM  
GPDNN  lin SVM  
RBF SVM  
DNN  
GPC  
GPDNN  
GPLVM  
linSVM  lin SVM  
RBF SVM  
DNN  
GPC  
GPDNN  
GPLVM  
DNN  lin SVM  
RBF SVM  
DNN 
MNIST91  FGSM / linSVM / GPFGS  CW  

origin  target  JBM  
GPC  
GPDNN  
GPLVM  
GPC  lin SVM  
RBF SVM  
linDNN  
DNN  
GPC  
GPDNN  
GPLVM  
GPDNN  lin SVM  
RBF SVM  
linDNN  
DNN  
GPC  
GPDNN  
GPLVM  
lin SVM  lin SVM  
RBF SVM  
linDNN  
DNN  
GPC  
GPDNN  
GPLVM  
linDNN  lin SVM  
RBF SVM  
linDNN  
DNN  
GPC  
GPDNN  
GPLVM  
DNN  lin SVM  < 