This paper explores an alternative approach to mitigating the effect of adversarial examples in a post deployment setting. Given a current example for which a prediction is required, we attempt to ascertain if this current example is adversarial or not. If the example is identified as being adversarial, then the deployed model could refrain from making a prediction similar to a learning with rejection setting Cortes et al. (2016). While this question has been extensively explored for neural networks, this is not the case for tree ensembles. Unfortunately, most existing methods for neural networks are not applicable to tree ensembles because they use properties unique to neural networks Zhang et al. (2022). For example, some modify the model Grosse et al. (2017); Gong et al. (2017); Metzen et al. (2017), learn other models (e.g., nearest neighbors) on top of the network’s intermediate representations Feinman et al. (2017); Lee et al. (2018); Katzir and Elovici (2019); Sperl et al. (2020), or learn other models on top of the gradients Schulze et al. (2021). Moreover, nearly all methods focus on detecting adversarial examples only in the context of image classification.
Tree ensembles are powerful because they combine the predictions made by many trees. Hence, the prediction procedure involves sorting the given example to a leaf node in each tree. The ordered set of the reached leaf nodes is an output configuration of the ensemble and fully determines the ensemble’s resulting prediction. However, there are many more possible output configurations than there are examples in the data used to train the model. For example, the California housing dataset Pace and Barry (1997)
only has eight features, but training an XGBoost ensemble containing 6, 7, or 8 trees each of at most depth 5 yields 62 248, 173 826, and 385 214 output configurations respectively.111Computed using Veritas Devos et al. (2021). These numbers (far) exceed the 20,600 examples in the dataset. The situation will be worse for the larger ensembles sizes that are used in practice. Our hypothesis is that
adversarial examples exploit unusual output configurations, that is, ones that are very different to those observed in the data used to train the model.
That is, small, but carefully selected perturbations can yield an example that is quite similar to another example observed during training, but yields an output configuration that is far away from those covered by the data used to train the model.
Based on this intuition, we present a novel method to detect adversarial examples based on assessing whether an example encountered post deployment has an unusual output configuration. When an example is encountered post deployment, our approach encodes it by its output configuration and then measures the distance between the encoded example and its nearest (encoded) neighbor in a reference set. If this distance is sufficiently high, the example is flagged as being an adversarial one and the model can abstain from making a prediction. Our approach has several benefits. First, it is general: it works with any additive tree ensemble. Second, it is integrated: it does not require training a separate model to identify adversarial examples, one simply has to set a threshold on the distance. Finally, it is surprisingly fast as the considered distance metric can be efficiently computed by exploiting instruction level parallelism (SIMD).
Empirically, we evaluate and compare our approach on three ensemble methods: gradient boosted trees (XGBoost Chen and Guestrin (2016)), random forests Breiman (2001), and GROOT Vos and Verwer (2021), which is a recent approach for training robust tree ensembles. We empirically show that our method outperforms multiple competing approaches for detecting adversarial examples post deployment for all three considered tree ensembles. Moreover, it can detect adversarial examples with a comparable computational effort.
Given an input space and an output space , we detect adversarial examples in deployed models learned from a dataset mapping to .
2.1 Additive tree ensembles
This paper proposes a method that works with additive ensembles of decision trees (e.g., those available in XGBoostChen and Guestrin (2016), LightGBM Ke et al. (2017) and Scikit-learn Pedregosa et al. (2011)). This encompasses both random forests and (gradient) boosted decision trees.
A binary tree is a recursive data structure consisting of nodes. It has one root node, which is the only node that is not a descendent of any other node. Every node is either a leaf node containing an output value, or an internal node storing a test (e.g., is less than 5?) that indicates whether to go left or right, and references to left and right sub-trees. A decision tree is evaluated on an example by starting at the root and executing the tests until a leaf node is reached.
An additive ensemble of decision trees is a sum of trees. The prediction for an example involves summing up the predicted leaf values of each decision tree in the ensemble: , , with the number of trees in the ensemble.
2.2 Output configurations
The output configuration (OC) of an example is the ordered set of leaf nodes that are reached when evaluating the trees in the ensemble. An output configuration corresponds to a feasible combination of root-to-leaf paths where there is one such path for each tree in the ensemble. We define a mapping mapping an example to its output configuration. We call the discrete space of all feasible output configurations the OC-space of an ensemble. This OC-space corresponds to equivalence classes in Törnblom and Nadjm-Tehrani (2020). Note that is not just the Cartesian product of all leaves. This is because some combinations of leaves are invalid, e.g., in Figure 1, and are invalid configurations because cannot be both less than 3 and greater than 4 at the same time.
2.3 Adversarial examples
We use the same definition of adversarial examples as Kantchelian et al. (2016); Chen et al. (2019b); Devos et al. (2021) and others: is an adversarial example of when three conditions hold: (1) is small according to some norm, (2) returns the correct label for , and (3) . This is similar to the prediction-change setting of Diochnos et al. (2018), which only requires conditions (1) and (3).
3 Detecting adversarial examples
We assume a post-deployment setting where a tree ensemble is operating in the wild. Our task can then be defined as follows:
a deployed tree ensemble and an example for which a prediction is required
assign a score to indicating whether is an adversarial example
Our algorithm is based on the fact that for sufficiently large models, the vast majority of the model’s possible outputs configurations will not be observed in the data used to train the model. Given this insight, our hypotheses are that
Adversarial examples arise because certain minimal changes to the original example can produce large changes in the output configuration, causing the model’s prediction to change.
Adversarial examples produce highly unusual output configurations, exploiting the fact that most such configurations were never observed in the data used to train the model.
Decision tree learners employ heuristics to select a split criterion in each internal node that helps distinguish among the different classes. Consequently, most leaf nodes tend to be (strongly) predictive of one class. In an ensemble, correctly classified positive examples will tend to have output configurations consisting largely of leaves that predict the positive class whereas the converse is true for negative examples. Adversarial examples contain carefully selected perturbations to a small number of feature values that result in more leaves of the opposing class appearing in an output configuration, yielding anunusual output configuration with an incorrect output label. This suggests that measuring how similar a newly encountered example’s output configuration is to those that appear in the training set will be an effective way to detect adversarial examples.
First, we discuss how to measure the distance between output configurations. Second, we introduce our OC-score metric which computes the distance between a newly encountered example and the reference set. A higher score indicates that the example is more likely adversarial. Finally, we show how to efficiently compute the OC-score using instruction-level parallelism.
3.1 Distances between output configurations
Following the intuition that adversarial examples exploit unusual output configurations, our method measures the abnormality of an output configuration by comparing it to the typical output configurations traversed by examples in a reference set. Two output configurations and can be compared using the Hamming distance:
The Hamming distance counts the number of leaves that differ between the two output configurations. It measures the distance between two examples in OC-space rather than the input space . This is isomorphic to the proximity metric in Random Forests Breiman and Cutler (2002).
3.2 The OC-score metric: distance to the closest reference set example
Our approach requires a learned ensemble and a subset of the data used to train the model. It constructs a reference set by encoding the examples in into the OC-space by finding each one’s output configuration. In practice, is a matrix of small integers (e.g. uint8) that identify the leaves (e.g. the black identifiers in Figure 1). The th row in contains the identifiers of the leaves in the output configuration of the th example in In the experiments, we take to be the output configurations of the correctly classified training examples.
Given a newly encountered example , the ensemble is used to obtain its predicted label and output configuration . Then it receives a OC-score by computing the OC-space distance to the closest example in a reference dataset :
where is the subset of examples in with label . Higher OC-scores correspond to a higher chance of being an adversarial example. To operationalize this, a threshold can be set on the OC-scores to flag potential adversarial examples: when the threshold is exceeded the model should abstain from making a prediction.
3.3 Fast distance computations via SIMD
Finding an example’s nearest neighbors in the OC-space can be done very efficiently by exploiting the fact that ensembles tend to produce trees with a relatively small number of leaf nodes. Hence it is possible to assign each leaf in a tree a short binary code word that can be represented by a small integer and exploit instruction-level parallelism using SIMD to compute the Hamming distance.
, we need to compute the Hamming distance to each example in the reference set, that is, we need to slide the vectorover the rows of . The trees used in the experiments have no more than 256 leaves. Hence, when using the 256-bit AVX2 registers, we can compute the Hamming distance of 32 reference set examples in parallel. This massively speeds-up the computation even when is large. The SIMD pseudo-code is given in the supplement.
4 Related Work
Beyond the approaches mentioned in the introduction for detecting adversarial examples in neural networks, there are methods that look at the behavior of the decision boundary in an example’s neighborhood Fawzi et al. (2018); Roth et al. (2019); Tian et al. (2022). Unfortunately, these methods do not work well with tree ensembles because the use of binary axis-parallel splits make them step functions, which makes it difficult to extract information from the neighborhood. Also, relevant to this paper is the work investigating the relation between model uncertainty and adversarial examples Liu et al. (2019); Grosse et al. (2018).
The random forest manual Breiman and Cutler (2002) discusses defining distances between training examples in an analogous manner to OC-score. Typically, (variations on) this distance has been used for tasks such as clustering Shi and Horvath (2006) or making tree ensembles more interpretable Tan et al. (2020). To our knowledge, it has not been used for detecting adversarial examples. Alternatively, some works propose different ways to use the path an example follows in a tree ensemble to reencode them, typically with the idea of using this encoding as a new feature space for training a model Vens and Costa (2011); Pliakos and Vens (2016).
Each example’s OC-score
can be viewed as a model’s secondary output with the predicted class being its primary output. This fits into the larger task of machine learning with a reject optionCortes et al. (2016). Rejection aims to identify test examples for which the model was not properly trained. For such examples, the model’s predictions have an elevated risk of being incorrect, and hence may not be trustworthy. An example can be rejected due to ambiguity (i.e., how well the decision boundary is defined in a region) or novelty (i.e., how anomalous an example is wrt the observed training data) Hendrickx et al. (2021). The OC-score metric goes beyond measuring ambiguity in an ensemble (i.e., the model’s confidence in a prediction). Therefore, it can detect adversarial examples even if they fall in a region of the input space where the model’s decision boundary appears to be well defined given the training data.
5 Experimental Evaluation
Our experimental evaluation addresses three questions:222The supplement addresses a fourth question: How does the proportion of adversarial examples in the test set affect performance of our OC-score metric?
Can our approach more accurately detect adversarial examples than its competitors?
What is each approach’s prediction time cost associated with detecting adversarial examples?
How does the size of the reference set affect the performance of our OC-score metric?
We compare our OC-score to four approaches:
Ambiguity (ambig) This approach uses the intuition that because adversarial examples are somehow different than the training ones, the model will be uncertain about an adversarial example’s predicted label Grosse et al. (2018). This entails deciding whether an example lies near a model’s decision boundary. This can be done by ranking examples according to the uncertainty of the classifier:
is the probability of the positive class as predicted by the ensemblefor an example .
Local outlier factor (
Local outlier factor (lof) Breunig et al. (2000) Another intuition to detect adversarial examples is to employ an anomaly detector under the assumption that adversarial examples are drawn from a different distribution than non-adversarial ones. lof
is a state-of-the-art unsupervised anomaly detection method that assigns a score to each example denoting how anomalous it is. This approach entails learning alof model which is applied to each example.
Isolation forests (iforest) An isolation forest Liu et al. (2008) is state-of-the-art anomaly detector. It learns a tree ensemble that separates anomalous from normal data points by splitting on a randomly selected attribute using a randomly chosen split value between the minimum and maximum value of the attribute. Outliers tend to be split off earlier in the trees, so the depth of an example in the tree is indicative of how normal an example is. Again, this requires learning a separate model at training time.
This is an approach for detecting adversarial examples from the neural network literature Yang et al. (2020a). Unlike most other approaches, it is model agnostic as it looks at statistics of the features. It uses the accumulated feature attributions to rank examples:
where is the probability prediction of ensemble , and is with the th attribute set to 0. The observation in Yang et al. (2020a) is that variation in the feature attributions is larger for adversarial examples.
Datasets characteristics and learners’ hyperparameter settings. The characteristics are the number of features #F and examples. The learning rate for XGBoost is . Each ensemble contains trees and is the maximum tree depth.
5.1 Experimental methodology
We mimic the post-deployment setting using 5-fold cross validation. In each fold, a model is trained on clean training data (4 folds). Then, in the remaining fold some of the correctly classified examples are perturbed and turned into adversarial examples. This fold augmented by the perturbed examples is used as the unseen examples. The task is then, given the model and a reference set , to detect which of the examples in the unseen set are adversarial using the OC-score.
We test our approach on the eight benchmark datasets listed in Table 1. All datasets are min-max normalized to make perturbations of the same size to different attributes comparable. To demonstrate our approach’s generality, we consider three types of additive tree ensembles: (1) XGBoost boosted trees Chen and Guestrin (2016), (2) Scikit-learn random forests Pedregosa et al. (2011), and (3) GROOT robustified random forests Vos and Verwer (2021), which modifies the criteria for selecting the split condition when learning a tree to make them more robust against adversarial examples. Due to space constraints, we only show plots for 4 out of 8 datasets. The results for the remaining datasets are along the same lines and are provided in the supplement.
Experimental settings For a given dataset, each learner has the same number of trees in the ensemble and each tree is restricted to be of the same maximum depth. Details for each dataset are given in Table 1. We use the scikit-learn Pedregosa et al. (2011) implementation for lof and iforest and use the default scikit-learn hyper-parameters. The supplement reports the average accuracies of the learned models on each dataset and the attack model of the GROOT ensembles. All experiments ran on an Intel E3-1225 with 32GB of memory. Multi-threading was enabled for all methods.
Generating adversarial examples We generate adversarial examples using Veritas Devos et al. (2021) with the norm. Per fold in the benchmarks, we generate three different sets of adversarial examples. Each set is based on 500 randomly selected, correctly classified test set examples. The first set are the closest adversarial examples. For each of these adversarial examples, it is guaranteed that no other adversarial example exists that is closer to the original example. This set of examples corresponds to the ones generated by the Kantchelian et al.’s MILP approach Kantchelian et al. (2016). The second set of adversarial examples allows perturbations of size , where is the median adversarial perturbation observed in the set of closest adversarial examples. The third set has a larger perturbations of size . We refer to these three sets as closest adv., adv. x2, and adv. x5. Each set of adversarial examples thus has its own properties. The closest adversarial examples tend to barely cross a model’s decision boundary whereas for the adversarial examples in adv. x5 the models could return extremely confident outputs. The supplement contains illustrative adversarial examples for the image datasets mnist2v4 and fmnist2v4. For each set of 500 adversarial examples, we construct the final evaluation set by adding 2500 randomly selected normal previously unseen (i.e., not used to train the model) test set examples, apart from phoneme and spambase where we select 1080 and 920 normal examples, respectively.333Because these two datasets have fewer examples.
5.2 Results Q1: detecting adversarial examples
The task is to distinguish the 500 adversarial examples from the normal test set examples. We measure detection performance in two different ways: by evaluating ranking and coverage versus detection rate tradeoff.
Ranking. Each method assigns every test example a score, which can be used to rank examples from most to least likely that they are adversarial. The area under the ROC (AUC ROC) curve measures the quality of a ranking with respect to the classification task of separating adversarial from normal examples. In this case, it captures an approach’s ability to distinguish adversarial examples from non-adversarial ones. Figure 2 shows the AUC ROC as a function of the magnitude of the adversarial perturbation for all methods, datasets, and ensemble learners.
Our OC-score consistently performs the best overall. Its performance is stable across the three considered types of adversarial examples for all three ensemble learning techniques.
Ambiguity is effective at detecting the closest adversarial examples on all datasets and ensemble types. This is unsurprising as the closest adversarial examples tend to be constructed by perturbing the attribute values just enough to cause the generated adversarial example to fall just on the other side of a model’s decision boundary. Hence, by definition, the model is uncertain for these adversarial examples. Ambiguity’s ability to detect adversarial examples declines when they exhibit larger perturbations. However, an exception is ijcnn1 and mnist2v4 for Random Forests and GROOT, where ambiguity detects most adversarial examples. For phoneme, the AUC values consistently drop below 0.5 for larger perturbation sizes. This arises because the model makes exceedingly confident incorrect predictions for these adversarial examples, which causes normal examples to be ranked as more adversarial than actual adversarial examples. To illustrate: the mean ambiguity score is 0.017 for normal examples and drops from 0.68 for close adversarial examples to 0.008 for adv.x5 ones.
The anomaly detectors (iforest and lof) tend to perform poorly on most settings. The exceptions are the adv. x5 examples on phoneme, covtype, and ijcnn1 because these start to become out of sample (i.e., very far away from the training data). These are also the datasets with the smallest number of attributes. These methods also do better on GROOT where the modified split criteria produces more robust trees. Hence, the adversarial examples are farther away from the normal ones.
ML-LOO has highly variable performance and is consistently worse than OC-score. It tends to work best on image data. However, for random forests and GROOT, it performs extremely poorly on several other datasets (e.g., ijcnn1, covtype). For the XGBoost ensembles, ML-LOO is generally effective at detecting the closest adversarial examples, but its performance degrades for the more distant ones.
Coverage versus detection rate tradeoff. We now evaluate the detection approaches in an operational context where a model will first assess if an unseen example is adversarial or not. The model will only make predictions in cases where an examples is not flagged as being adversarial. In practice, this requires thresholding the scores produced by each method to arrive at a hard decision as to whether an example is adversarial. This induces a tradeoff between a method’s (a) coverage, which is the fraction of normal test examples that are correctly identified as being normal and hence a prediction is made, and (b) detection rate, or the percent of correctly identified adversarial examples. Figure 3 shows each approach’s detection rate as a function of coverage. The results average over all three types of adversarial examples. In nearly all cases, OC-score results in a higher detection for each level of coverage than all its competitors. Figure 3 only shows results for 4 out of 8 datasets, the results for the remaining datasets are in the supplement.
5.3 Q2: Prediction time cost
The run time cost has two parts: the setup time and the evaluation time. The setup time is the one-time cost incurred during training by some of the methods. This is negligible for most methods, apart from lof and is discussed in detail in the supplement.
The evaluation time measures the overhead of computing a score indicating how likely it is that a test example is adversarial. Table 2 reports the average per example evaluation time for each method on every dataset. Regardless of the method, the scores can typically be computed well under 1 millisecond, with some cases taking slightly longer. Unsurprisingly, ambiguity is almost always the fastest as this is a simple mathematical computation. iforest is also fast because it only requires executing a tree ensemble. Computing the OC-score is also relatively efficient due to exploiting SIMD. It scales with number of trees and the size of the reference set, yielding higher evaluation times for large datasets and ensemble sizes (i.e., covtype and higgs). However, the results in Subsection 5.4 indicate that it is possible to decrease the size of the reference set without degrading performance. ML-LOO’s costs comes from computing an importance for each feature. Hence, it yields longer times for datasets with many features (mnist2v4, fmnist2v4, webspam). lof is almost always the (second) slowest because computing the local density for each example is expensive.
5.4 Q3: Effect of the reference set’s size
Finally, we explore the effect of the size of the reference set by varying the proportion of correctly classified training examples in the reference set. Figure 4 shows the results for this experiment on four datasets using XGBoost ensembles. The plot on the left shows ROC AUC values for detecting adversarial vs. non-adversarial test examples. On all datasets, these values are relatively stable. There is a small decline in performance for the smallest reference set proportion, where the number of examples in the reference set ranges from 342 (spambase, RF) to 41 700 (covtype, XGB).
The plot on the right shows the average time in milliseconds to compute the OC-score per example as a function of the size of the reference set. The three smallest datasets show modest increases in the evaluation time for increasing reference set sizes. Higgs, which is by far the largest dataset, shows a steep increase as the size of the reference set increases which arises due to cache effects (i.e., misses) for the larger reference set sizes. However, these experiments suggest that using a small reference set would drastically improve the run time on this dataset without degrading performance. Note that changing the reference set does not require relearning the underlying ensemble used to make predictions, which is still learned using the full training set.
6 Conclusions and Discussion
This paper explored how to detect adversarial examples post deployment for additive tree ensembles by detecting unusual output configurations. Our approach works with any additive tree ensemble and does not require training a separate model. If a newly encountered example’s output configuration differs substantial from those in the training set, then it is more likely to be an adversarial example. Empirically, our proposed OC-score metric resulted in superior detection performance than existing approaches for three different tree ensemble learners on multiple benchmark datasets. One limitation of our work is that operationalizing it requires setting a threshold on the OC-score which may be difficult in practice. Moreover, while our approach achieves good performance it is unclear what constitutes acceptable detection performance for a deployed system because this is use-case dependent. While our approach makes it more difficult to perform adversarial attacks, it may also inspire novel strategies to construct adversarial examples.
This work was supported by iBOF/21/075, Research Foundation-Flanders (EOS No. 30992574, 1SB1320N to LD) and the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.
- Andriushchenko and Hein  Maksym Andriushchenko and Matthias Hein. Provably robust boosted decision stumps and trees against adversarial attacks. In Advances in Neural Information Processing Systems, volume 32, 2019.
- Biggio et al.  Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases, pages 387–402, 2013.
- Breiman  Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
- Breiman and Cutler  Leo Breiman and Adele Cutler. Random forests manual. https://www.stat.berkeley.edu/~breiman/RandomForests, 2002.
- Breunig et al.  Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104, 2000.
- Chen et al. [2019a] Hongge Chen, Huan Zhang, Duane Boning, and Cho-Jui Hsieh. Robust decision trees against adversarial examples. In International Conference on Machine Learning, pages 1122–1131, 2019a.
- Chen et al. [2019b] Hongge Chen, Huan Zhang, Si Si, Yang Li, Duane Boning, and Cho-Jui Hsieh. Robustness verification of tree-based models. In Advances in Neural Information Processing Systems, volume 32, pages 12317–12328, 2019b.
- Chen and Guestrin  Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, 2016.
- Cortes et al.  Corinna Cortes, Giulia DeSalvo, and Mehryar Mohri. Learning with rejection. In Proceedings of The 27th International Conference on Algorithmic Learning Theory (ALT 2016), 2016.
- Devos et al.  Laurens Devos, Wannes Meert, and Jesse Davis. Versatile verification of tree ensembles. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 2654–2664, 2021.
Diochnos et al. 
Dimitrios Diochnos, Saeed Mahloujifar, and Mohammad Mahmoody.
Adversarial risk and robustness: General definitions and implications for the uniform distribution.Advances in Neural Information Processing Systems, 31, 2018.
Einziger et al. 
Gil Einziger, Maayan Goldstein, Yaniv Sa’ar, and Itai Segall.
Verifying robustness of gradient boosted models.
Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 2446–2453, 2019.
- Fawzi et al.  Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 1186–1195, 2018.
- Feinman et al.  Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017.
- Gong et al.  Zhitao Gong, Wenlu Wang, and Wei-Shinn Ku. Adversarial and clean data are not twins. arXiv preprint arXiv:1704.04960, 2017.
- Goodfellow et al.  Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
- Grosse et al.  Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.
Grosse et al. 
Kathrin Grosse, David Pfaff, Michael Thomas Smith, and Michael Backes.
The limitations of model uncertainty in adversarial settings.
4th workshop on Bayesian Deep Learning (NeurIPS 2019), 2018.
- Hendrickx et al.  Kilian Hendrickx, Lorenzo Perini, Dries Van der Plas, Wannes Meert, and Jesse Davis. Machine learning with a reject option: A survey. CoRR, abs/2107.11277, 2021.
- Kantchelian et al.  Alex Kantchelian, J Doug Tygar, and Anthony Joseph. Evasion and hardening of tree ensemble classifiers. In International Conference on Machine Learning, pages 2387–2396. PMLR, 2016.
- Katzir and Elovici  Ziv Katzir and Yuval Elovici. Detecting adversarial perturbations through spatial behavior in activation spaces. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1–9, 2019.
- Ke et al.  Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. LightGBM: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems, volume 30, pages 3146–3154, 2017.
- Lee et al.  Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018.
- Liu et al.  Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In 2008 8th IEEE international conference on data mining, pages 413–422. IEEE, 2008.
- Liu et al.  Xuanqing Liu, Yao Li, Chongruo Wu, and Cho-Jui Hsieh. Adv-BNN: Improved adversarial defense through robust bayesian neural network. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rk4Qso0cKm.
- Metzen et al.  Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In Proceedings of 5th International Conference on Learning Representations (ICLR), 2017.
- Pace and Barry  R Kelley Pace and Ronald Barry. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291–297, 1997.
- Pedregosa et al.  F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Pliakos and Vens  Konstantinos Pliakos and Celine Vens. Feature induction and network mining with clustering tree ensembles. In Proceedings of the International Workshop on New Frontiers in Mining Complex Patterns, pages 3–18, 2016.
- Ranzato and Zanella  Francesco Ranzato and Marco Zanella. Abstract interpretation of decision tree ensemble classifiers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5478–5486, 2020.
Ranzato and Zanella 
Francesco Ranzato and Marco Zanella.
Genetic adversarial training of decision trees.
Proceedings of the Genetic and Evolutionary Computation Conference, page 358–367, 2021.
Roth et al. 
Kevin Roth, Yannic Kilcher, and Thomas Hofmann.
The odds are odd: A statistical test for detecting adversarial examples.In International Conference on Machine Learning, pages 5498–5507. PMLR, 2019.
- Schulze et al.  Jan-Philipp Schulze, Philip Sperl, and Konstantin Böttinger. DA3G: Detecting adversarial attacks by analysing gradients. In European Symposium on Research in Computer Security, pages 563–583, 2021.
- Shi and Horvath  Tao Shi and Steve Horvath. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 2006.
- Sperl et al.  Philip Sperl, Ching-Yu Kao, Peng Chen, Xiao Lei, and Konstantin Böttinger. DLA: dense-layer-analysis for adversarial example detection. In 2020 IEEE European Symposium on Security and Privacy (EuroS&P), pages 198–215, 2020.
- Szegedy et al.  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
Tan et al. 
Sarah Tan, Matvey Soloviev, Giles Hooker, and Martin T. Wells.
Tree space prototypes: Another look at making tree ensembles
Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference, pages 23–34, 2020.
- Tian et al.  Jinyu Tian, Jiantao Zhou, Yuanman Li, and Jia Duan. Detecting adversarial examples from sensitivity inconsistency of spatial-transform domain. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
- Törnblom and Nadjm-Tehrani  John Törnblom and Simin Nadjm-Tehrani. Formal verification of input-output mappings of tree ensembles. Science of Computer Programming, 194:102450, 2020.
- Vens and Costa  Celine Vens and Fabrizio Costa. Random forest based feature induction. In Proceedings of 11th IEEE International Conference on Data Mining, pages 744–753, 2011.
- Vos and Verwer  Daniël Vos and Sicco Verwer. Efficient training of robust decision trees against adversarial examples. In International Conference on Machine Learning, pages 10586–10595, 2021.
- Wang et al.  Yihan Wang, Huan Zhang, Hongge Chen, Duane Boning, and Cho-Jui Hsieh. On Lp-norm robustness of ensemble decision stumps and trees. In International Conference on Machine Learning, pages 10104–10114, 2020.
- Webb and Ting  Geoffrey I Webb and Kai Ming Ting. On the application of roc analysis to predict classification performance under varying class distributions. Machine learning, 58(1):25–32, 2005.
- Yang et al. [2020a] Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, and Michael Jordan. ML-LOO: Detecting adversarial examples with feature attribution. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):6639–6647, 2020a.
- Yang et al. [2020b] Yao-Yuan Yang, Cyrus Rashtchian, Yizhen Wang, and Kamalika Chaudhuri. Robustness for non-parametric classification: A generic attack and defense. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 941–951, 2020b.
- Zhang et al.  Chong Zhang, Huan Zhang, and Cho-Jui Hsieh. An efficient adversarial attack for tree ensembles. In Advances in Neural Information Processing Systems, volume 33, pages 16165–16176, 2020.
- Zhang et al.  Shigeng Zhang, Shuxin Chen, Xuan Liu, Chengyao Hua, Weiping Wang, Kai Chen, Jian Zhang, and Jianxin Wang. Detecting adversarial samples for deep learning models: A comparative study. IEEE Transactions on Network Science and Engineering, 9(1):231–244, 2022.
Appendix A Fast OC-score computation using SIMD
Finding an example’s nearest neighbors in the OC-space can be done very efficiently by exploiting the fact that ensembles tend to produce trees with a relatively small number of leaf nodes. This is especially true for boosted ensembles like the ones generated by XGBoost Chen and Guestrin  and LightGBM Ke et al. , where the number of leaves are often limited to less than 256 (e.g. maximum depth 8). Hence it is possible to assign each leaf in a tree a short binary code word (e.g. an 8-bit unsigned integer). An output configuration is then an array of short binary codes. This makes it possible to exploit instruction-level parallelism using SIMD to compute the Hamming distance between an output configuration of a new example, and the output configurations in the reference set .
In the evaluation of this paper, we generate trees with at most 256 leafs. Hence, we can uniquely identify each leaf (per tree) with an 8-bit identifier. For any leaf of some tree , , let be the 8-bit identifier of that leaf (e.g. a depth-first numbering of the leaves). An output configuration of an example is then an array of 8-bit identifiers:
The reference set becomes an 8-bit matrix, with the output configurations of the reference set examples in its rows:
To compute the OC-score of a new example , we need to slide its identifier vector over the rows of , compute the Hamming distance between the identifier vector and the row, and keep track of the minimum distance. This can be done very quickly using SIMD. More specifically, we use the AVX2 extension to the x86 instruction set which introduces CPU instructions that operate on 32-byte wide registers, permitting computations on all 32 values simultaneously.
Algorithm 1 shows how to use SIMD to compute the OC-score of a new example . We store in column-major order which means the values in the columns of are stored consecutively in memory. We assume that the number of examples in our reference set is a multiple of 32 for brevity. The function sets all 32 bytes in a register to the byte value . The other vectorized functions, indicated with a subscript , perform the operations implied by their name on all 32 values simultaneously. The register maintains 32 OC-score values, and is reduced to a single value at line 15.
Using unsigned bytes for the identifiers limits the ensemble’s size in two ways. First, a byte only has 256 unique values, which restricts the number of leaves in each tree to be at most 256. Second, the sum on Line 11 could potentially overflow when the ensemble has more than 255 trees. Using a 16-bit short would alleviate this limitation, but make the algorithm about two times slower.
Appendix B Predictive performance of used tree ensembles
At the basis of each experiment in this paper are tree ensembles learned from data: we generate adversarial examples for these ensembles, and the OC-score uses the OC-space defined by the trees. We mimic the post-deployment setting, which means that we consider the tree ensembles given, and are not concerned with the circumstances in which it was learned, i.e., the point of this paper is not to train the most accurate models, nor to deal with adversarial attacks during the training phase. However, for completeness sake, we do include the test set accuracies and empirical robustness values achieved by the XGBoost, Random Forest and GROOT ensemble models to show that we are using competitive models.
Table 4 shows the average accuracy over five folds for each ensemble type on each dataset. XGBoost generally performs the best. Random forests and GROOT ensembles tend to produce more robust models than XGBoost. They exhibit better empirical robustness scores as shown by the values in Table 4 and the adversarial examples generated for these ensembles tend to be further away from the original distribution. lof and iforest, which are particularly well suited at detecting if a test example is out-of-sample, are better at detecting adversarial examples for these ensembles than ones for XGBoost.
Appendix C AUC plots for all datasets
The main paper only shows the ROC AUC values for the first 4 datasets in Figure 2. Figure 5 shows ROC AUC plots for all datasets.
The ROC AUC values show how effective each method is at distinguishing adversarial examples from normal ones for a post-deployed model. Figure 5 show the performance in function of the magnitude of the adversarial perturbation. OC-score’s performance is stable across the three considered types of adversarial examples for all three ensemble types.
Appendix D Detection rate plots for all datasets
Figure 3 in the main paper only shows the coverage versus detection rate results for the first four datasets. Figure 6 shows the coverage versus detection rate results for XGBoost, Random Forests and GROOT ensembles for all datasets. Our OC-score metric is consistently the best or equally as good as a competing method. It is also the only method that consistently performs well on all datasets and for all ensemble types.
Appendix E Setup time
Table 5 gives the setup times for each detection method. These are the one-time costs incurred during training. The setup time for OC-score is simply the time taken to encode the reference set .444Note that tree learning entails sorting each training example to a leaf node, unless bagging is employed. The setup time for iforest and lof is the time needed to construct the models. lof has by far the largest setup time because it constructs an index of the training examples. The setup times of ambiguity and ML-LOO are 0, because they do not use any auxiliary structures that need to be initialized.
Appendix F Time complexity of OC-score metric
The time complexity of the OC-score for a single example at prediction time is . is the number of trees and thus the size of an output configuration. is the size of the reference set. Figure 7 shows that, indeed, changing the size of the reference set has a linear effect on the evaluation time, except for the larger datasets where cache misses can have a considerable effect on the run time. Note that this effect is CPU dependent and can be less pronounced on other CPUs.
Appendix G Effect of the reference set’s size
Figure 4 in the main paper shows the effect of varying the size of the reference set only for a selection of four datasets for XGBoost ensembles. Figure 7 shows the results for all datasets and all ensemble types. We observe similar results for all datasets and all ensemble types. The ROC AUC values (left subplots) for detecting adversarial vs. non-adversarial test examples are relatively stable, showing only a small decline in performance for the smallest reference set consisting of only 10% of the correctly classified training set examples. The time savings (right subplots) are also consistent across datasets and ensemble types.
Appendix H Distribution of OC-space hamming distance for adversarial and random perturbations
One assumption that this paper makes is that adversarial perturbations are magnified in OC-space, and that they can be picked up in OC-space using the Hamming distance. To show that this is the case, we randomly select 100 unseen correctly classified test set examples, and perturb them in two ways. The first perturbation is adversarial and flips the predicted label of the example. The second perturbation is random and has the same (small) and same norm (i.e., same number of affected features) as the adversarial perturbation. We repeat this 5 times, once for each fold, and average the results.
Figure 8 plots the Hamming distances between the original examples and the adversarially and randomly perturbed example respectively, for the three adversarial sets and the each ensemble type. We see that the adversarial perturbations have a much larger effect on the Hamming distance in the OC-space than the random perturbations overall, even though the magnitudes of the perturbations are the same. This is not surprising, as in order to flip the label, the adversarial perturbation has to be carefully crafted in such a way that different leaves are activated. This illustrates that adversarial perturbations are magnified in OC-space, and that a simple metric like the Hamming distance can be used to detect them.
The plots in Figure 8 also show that the larger the perturbations, the larger the relative distances in OC-space become. Although this effect exists for both the random and adversarial perturbations, it is more pronounced for the latter. We also see that the smaller the number of attributes in the data, the weaker the difference is (e.g. phoneme). With fewer attributes, a random permutation is more likely to change an attribute that is used in many trees, which in turn has an effect on the predicted leaves.
Appendix I Prevalence of adversarial examples
While ROC analysis is generally invariant to varying class proportions, there is some debate if this is always the case Webb and Ting . Thus, for completeness, we explore the effect of the prevalence of adversarial examples in the test set on the detection performance of the OC-score metric. We reuse the same adversarial examples generated for the experiments in the main paper. To obtain the desired ratios, we randomly selected the following numbers of normal and adversarial examples respectively: (500, 500), (1000, 500), (1500, 500), (2000, 500), (2500, 500), (2400, 400), (2100, 300), (2400, 300), (1800, 200), (2500, 250). This is repeated five times for each fold, each time with a different model, different reference set, and different adversarial examples.
Figure 9 shows the AUC ROC values for using OC-score to detect adversarial vs. non-adversarial test examples as function of the ratio of normal to adversarial examples in the test set. Results are shown for all datasets. Regardless of the ratio, the detection performance is relatively stable on all datasets. This is true for all three considered ensemble learners.
Appendix J Examples of adversarial examples for each ensemble type
Figures 10 and 11 show four adversarial examples, two of each class, for mnist and fmnist (pullover vs. coat). For each example and each ensemble type, we show the original example, the closest adversarial example (close adv.), adv. x2 and adv. x5. The predicted probability is shown below the image for each ensemble type.
A first observation is that the predicted probabilities of the closest adversarial examples are indeed close to 0.5, which explains why ambiguity tends to pick them up.
A second observation is that XGBoost is extremely confident in its predictions, and the perturbation sizes required to trick it are the smallest overall. For the 4s and all the fmnist examples, the affected pixels are hard to discern. Moreover, XGBoost is very confident in its wrong prediction for adv. x5 examples, even though the perturbations are tiny.
A third observation is GROOT ensembles are more difficult to trick. Veritas often finds the same example for adv. x2 and adv. x5 because the space of adversarial examples is much smaller, and it thus tends to arrive at the same result even though maximum perturbation sizes are different.
Appendix K Error bars on coverage vs. detection rate and timings
We mentioned in the main paper that we did not include error ranges for the coverage versus detection rate plots for clarity. For completeness, we include these plots in Figure 12
. Similarly, we also include the evaluation timings table with the standard deviations in Table6 (Table 2 in the main paper).
|OC-score||0.00192 0.000131||0.0019 0.000273||0.973 0.157||0.587 0.0504|
|ambig||0.0419 0.137||0.0314 0.0781||0.0198 0.0428||0.0202 0.0397|
|iforest||0.0408 0.00254||0.0461 0.00425||0.0347 0.000229||0.0348 0.000277|
|LOF||0.0133 0.00288||0.0677 0.00831||7.23 0.168||3.06 0.133|
|ML-LOO||0.0638 0.00773||0.103 0.0133||0.28 0.0111||0.331 0.0157|
|OC-score||0.0665 0.00732||0.0221 0.00136||0.0218 0.00165||0.224 0.063|
|ambig||0.0161 0.0375||0.0278 0.045||0.0278 0.0455||0.0203 0.0418|
|iforest||0.0307 0.000222||0.139 0.00155||0.139 0.00196||0.06 0.000362|
|LOF||1.84 0.0229||0.307 0.00443||0.308 0.0253||5.55 0.0655|
|ML-LOO||0.12 0.00345||1.54 0.185||1.71 0.0357||0.567 0.0185|
Appendix L Total compute time for adversarial example generation
Table 7 shows the compute time required to compute the three sets of adversarial examples for each dataset and each model type. The total time taken to compute all adversarial examples is just below 46 hours.
The total training time for the ensembles is negligible with a total training time of less than 1 hour for all models. Training a GROOT model takes on average about 10 times longer than training a scikit-learn Pedregosa et al.  Random Forests.