1 Introduction
Recent studies have demonstrated that neural network models are vulnerable to adversarial perturbations—a small and imperceptibletohuman input perturbation can easily change the predicted label
[31, 15, 6, 13]. This has created serious security threats to many real applications so it becomes important to formally verify the robustness of machine learning models. Usually, the robustness verification problem can be cast as finding the minimal adversarial perturbation to an input example that can change the predicted class label. A series of robustness verification algorithms have been developed for neural network models
[19, 32, 36, 35, 34, 38, 14, 29], where efficient algorithms are mostly based on relaxation or approximation of nonlinear activation functions of neural networks.
We study the robustness verification problem of treebased models, including a single decision tree and tree ensembles such as random forests (RFs) and gradient boosted decision trees (GBDTs). These models have been widely used in practice and recent studies have demonstrated that both RFs and GBDTs are vulnerable to adversarial perturbations [18, 12, 8]. It is thus important to develop a formal robustness verification algorithm for treebased models. Robustness verification requires computing the minimal adversarial perturbation. [18] showed that computing minimal adversarial perturbation for tree ensemble is NPcomplete in general, and they proposed a MixedInteger Linear Programming (MILP) approach to compute the minimal adversarial perturbation. Although exact verification is NPhard, in order to have an efficient verification algorithm for real applications we seek to answer the following questions:

[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt]

Can we have efficient polynomial time algorithms for exact verification under some special circumstances?

For general tree ensemble models with a large number of trees, can we efficiently compute a meaningful lower bounds on robustness while scaling to large tree ensembles?
In this paper, we answer the abovementioned questions in the affirmative by formulating the verification problem of tree ensemble as a graph problem. First, we show that for a single decision tree, robustness verification can be done exactly in linear time. Then we show that for an ensemble of trees, the verification problem is equivalent to finding the maximum cliques in a partite graph, and the graph is in a special form with boxicity equal to the input feature dimension. Therefore, for lowdimensional problems, verification can also be done in polynomial time with maximum clique searching algorithms. Finally, for largescale tree ensembles, we propose a multiscale verification algorithm by exploiting the boxicity of the graph, which can give tight lower bounds on robustness. Furthermore, it supports anytime termination: we can stop the algorithm at any time to obtain a reasonable lower bound given a computation time constraint. Our proposed algorithm is efficient and is scalable to large tree ensemble models. For instance, on a large multiclass GBDT with 200 trees robustly trained on the MNIST dataset (using [8]), we obtained 78% verified robustness accuracy on test set with maximum perturbation and the time used for verifying each test example is 12.6 seconds, whereas the MILP method uses around 10 min for each test example.
2 Background and Related Work
Adversarial Robustness
For simplicity, we consider a multiclass classification model where is the input dimension and is number of classes. For an input example , assuming that is the correct label, the minimal adversarial perturbation is defined by
(1) 
Note that we focus on the norm measurement in this paper which is widely used in recent studies [22, 36, 5]. Exactly solving (1) is usually intractable. For example, if is a neural network, (1) is nonconvex and [19] showed that solving (1
) is NPcomplete for ReLU networks.
Adversarial attacks are algorithms developed for finding a feasible solution of (1), where is an upper bound of . Many algorithms have been proposed for attacking machine learning models [15, 20, 6, 22, 9, 10, 16, 3, 12, 24, 21]. Most practical attacks cannot reach the minimal adversarial perturbation due to the nonconvexity of (1). Therefore, attacking algorithms cannot provide any formal guarantee on model robustness [1, 33].
On the other hand, robustness verification algorithms are designed to find the exact value or a lower bound of . An exact verifier needs to solve (1) to the global optimal, so typically we resort to relaxed verifiers that give lower bounds. When a verification algorithm finds a lower bound , it guarantees that no adversarial example exists within a radius ball around . This is important for deploying machine learning algorithms to safetycritical applications such as autonomous vehicles or aircraft control systems [19, 17].
For verification, instead of solving (1) we can also solve the following decision problem of robustness verification
(2) 
Note that in our setting
. If we can answer this decision problem, a binary search can give us the value of , so the complexity of (2) is in the same order of (1). Furthermore, a safe answer to (2) (always say yes when unsure) will lead to a lower bound of , which is what we want to do in verification. The decision version is also widely used in the verification community since people care about “robustness error at perturbation” which is defined to be the ratio of number of test samples that satisfy (2). Verification methods for neural networks have been studied extensively in the past few years [36, 37, 35, 38, 29, 14, 30, 2].
Adversarial Robustness of Treebased Models
Unlike neural networks, decisiontree based models are noncontinuous step functions, and thus existing neural network verification algorithms cannot be directly applied. To evaluate the robustness of treebased models, [18] showed that solving (1) for general tree ensemble models is NPcomplete, so no polynomial time algorithm can compute unless P=NP. A Mixed Integer Linear Programming (MILP) algorithm was thus proposed in [18] to compute (1) in exponential time. Some hardlabel attacking algorithms for neural networks, including the boundary attack [3] and OPTattack [12], can also be applied since they only require function evaluation of the nonsmooth (hardlabel) decision function , and can be viewed as faster ways to compute an upper bound of . To the best of our knowledge, there is no prior existing algorithm for efficient verification, or equivalently, efficiently computing a lower bound of for ensemble trees.
3 Proposed Algorithm
In this section, we propose the firstever tree ensemble verification algorithms. The tree ensemble exact verification problem is NPcomplete by its nature, and here we propose a series of efficient verification algorithms for real applications. First, we will introduce a linear time algorithm for exactly computing the minimal adversarial distortion for verifying a single decision tree. For an ensemble of trees, we cast the verification problem into a maxclique searching problem in Kpartite graphs. For largescale tree ensembles, we then propose an efficient multilevel algorithm for verifying an ensemble of decision trees.
3.1 Exactly Verifying a Single Tree in Linear Time
Although computing for a tree ensemble is NPcomplete [18], we show that a linear time algorithm exists for finding the minimum adversarial perturbation and computing for a single decision tree. We assume the decision tree has nodes and the root node is indexed as . For a given example with features, starting from the root, traverses the decision tree model until reaching a leaf node. Each internal node, say node , has two children and a featurethreshold pair to determine the traversal direction— will be passed to the left child if and to the right child otherwise. Each leaf node has a value corresponding to the predicted class label for a classification tree, or a real value for a regression tree.
Conceptually, the main idea of our single tree verification algorithm is to compute a dimensional box for each leaf node such that any example in this box will fall into this leaf. Mathematically, the node ’s box is defined as the Cartesian product of closed intervals on the real line. By definition, the root node has box and given the box of an internal node , its children’s boxes can be obtained by changing only one interval of the box based on the split condition . More specifically, if are node ’s left and right child node respectively, then we set their boxes and by setting
(3) 
After computing the boxes for internal nodes, we can also obtain the boxes for leaf nodes using (3). Therefore computing the boxes for all the leaf nodes of a decision tree can be done by a depthfirst search traversal of the tree with time complexity .
With the boxes computed for each leaf node, the minimum perturbation required to change to go to a leaf node
can be written as a vector
defined as(4) 
Then the minimal distortion can be computed as , where is the original label of , and is the label for leaf node . To find , we check for all leaves and choose the smallest perturbation. This is a lineartime algorithm for exactly verifying the robustness of a single decision tree.
In fact, this time algorithm is used to illustrate the concept of “boxes” that will be used later on for the tree ensemble case. If our final goal is to verify a single tree, we can have a more efficient algorithm by combining the distance computation (4) in the tree traversal procedure, and the resulting algorithm will take only time. This algorithm is presented as Algorithm 1 in the appendix.
3.2 Verifying Tree Ensembles by Maxclique Enumeration
Now we discuss the robustness verification for tree ensembles. Assuming the tree ensemble has decision trees, we use to denote the set of leaf nodes of tree and to denote the function that maps the input example to the leaf node of tree according to its traversal rule. Given an input example , the tree ensemble will pass to each of these trees independently and reaches leaf nodes for all . Each leaf node will assign a prediction value . For simplicity we start with the binary classification case, with ’s original label being and we want to turn it into . For binary classification the prediction of the tree ensemble is computed by , which covers both GBDTs and random forests, two widely used tree ensemble models. Assume has the label , that means for , and our task is to verify if the sign of the summation can be flipped within .
We consider the decision problem of robustness verification (2). A naive analysis will need to check all the points in which is uncountably infinite. To reduce the search space to finite, we start by defining some notation: let to be all the possible tuples of leaf nodes and let be the function that maps to the corresponding leaf nodes. Therefore, a tuple directly determines the model prediction . Now we define a valid tuple for robustness verification:
Definition 1.
A tuple is valid if and only if there exists an such that .
The decision problem of robustness verification (2) can then be written as:
Does there exist a valid tuple such that ? 
Next, we show how to model the set of valid tuples. We have two observations. First, if a tuple contains any node with , then it will be invalid. Second, there exists an such that if and only if , or equivalently:
We show that the set of valid tuples can be represented as cliques in a graph , where and . In this graph, nodes are the leaves of all trees and we remove every leaf that has empty intersection with . There is an edge between node and if and only if their boxes intersect. The graph will then be a partite graph since there cannot be any edge between nodes from the same tree, and thus maximum cliques in this graph will have nodes. We define each part of the partite graph as . Here a “part” means a disjoint and independent set in the partite graph. The following lemma shows that intersections of boxes have very nice properties:
Lemma 1.
For boxes , if for all then and their intersection will also be an nonempty box: .
The proof can be found in the appendix. Based on the above lemma, each clique (fully connected subgraph with nodes) in can be viewed as a set of leaf nodes that has nonempty intersection with each other and also have nonempty intersection with , so the intersection of those boxes and will be a nonempty box, which implies each clique corresponds to a valid tuple of leaf nodes:
Lemma 2.
A tuple is valid if and only if nodes form a clique (maximum clique) in graph constructed above.
Therefore the robustness verification problem can be formulated as
Is there a maximum clique in such that ?  (5) 
This reformulation indicates that the tree ensemble verification problem can be solved by an efficient maximum clique enumeration algorithm. Some standard maximum clique searching algorithms can be applied here to perform verification:

[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt,leftmargin=*]

Finding cliques in partite graphs: Any algorithm for finding all the maximum cliques in can be used. The classic BK backtracking algorithm [4] takes time to find all the maximum cliques where is the number of nodes in . Furthermore, since our graph is a partite graph, we can apply some specialized algorithms designed for finding all the cliques in partite graphs [23, 25, 28].

Polynomial time algorithms exist for lowdimensional problems: Another important property for graph is that each node in is a dimensional box and each edge indicates intersection of two boxes. This implies our graph is with “boxicity ” (see [7] for detail). [7] proved that the number of maximum cliques will only be and it is able to find the maximum weight clique in time. Therefore, for problems with very small , the time complexity for verification is actually polynomial.
Therefore we can exactly solve the tree ensemble verification problem using algorithms for maximum cliques searching in partite graph, and its time complexity is found to be as follows:
Theorem 1.
Exactly verifying the robustness of a tree ensemble with at most leaves per tree and dimensional features takes time.
This is a direct consequence of the fact that the number of cliques in a partite graph with vertices per part is bounded by , and number of maximum cliques in a graph with a total of vertices with boxicity is . For a general graph, since and can be in and [27], it can still be exponential. But the theorem gives a more precise characterization for the complexity of the verification problem for tree ensembles.
Based on the nice properties of maximum cliques searching problem, we propose a simple and elegant algorithm that enumerates all cliques on a partite graph with a known boxicity in Algorithm 2 in the appendix, and we can use this algorithm for tree ensemble verification when the number of trees or the dimension of features is small.
3.3 An Efficient Multilevel Algorithm for Verifying the Robustness of a Tree Ensemble
Practical tree ensembles usually have tens or hundreds of trees and with large dimensions, so Algorithm 2 will take exponential time and will be too slow. We thus develop an efficient multilevel algorithm for computing verification bounds by further exploiting the boxicity of the graph.
Figure 1 illustrates the graph and how our multilevel algorithm runs. There are four trees and each tree has four leaf nodes. A node is colored if it has nonempty intersection with ; uncolored nodes are discarded. To answer question (5), we need to compute the maximum among all cliques, denoted by . As mentioned before, for robustness verification we only need to compute an upper bound of in order to get a lower bound of minimal adversarial robustness. In the following, we will first discuss algorithms for computing an upper bound at the top level, and then show how our multiscale algorithm iterative refines this bound until reaching the exact solution .
Bounds for a single level.
To compute an upper bound of , a naive approach is to assume that the graph is fully connected between independent sets (fully connected partite graph) and in this case the maximum sum of node values will be the sum of the maximum value of each independent set:
(6) 
One can easily show this is an upper bound of since any original clique will still be considered when we add more edges to the graph.
Another slightly better approach is to exploit the edge information but only between tree and . If we search over all the length paths from the first to the last group and define the value of a path to be , then the maximum valued path will be a upper bound of . This can be computed in linear time using dynamic programming. We scan nodes from tree to tree , and for each node we store a value which is the maximum value of paths from tree to this node. At tree and node , the value can be computed by
(7) 
Taking the max value in the last tree will give us a tighter upper bound of .
Merging independent sets
Now we try to refine our bound. Our approach is to partition the graph into groups, each with independent sets. Within each group, we find all the cliques and use a new “pseudo node” to represent each clique. cliques in a partite graph can be enumerated efficiently if we choose to be a relative small number (e.g., or in the experiments).
Now we exploit the boxicity property of our graph to form a graph among these cliques (illustrated as the second level nodes in Figure 1). By Lemma 1, we know that the intersection of boxes will still be a box, so each clique is still a box and can be represented as a pseudo node in the level2 graph. Also because each pseudo node is still a box, we can easily form edges between pseudo nodes to indicate the nonempty overlapping between them and this will be a partite boxicity graph since no edge can be formed for the cliques within the same group. Thus we get the level2 graph. With the level2 graph, we can again run the single level algorithm to compute a upper bound on to get a lower bound of in (1), but different from the level1 graph, now we already considered all the withingroup edges so the value we get will be less or equal to the level1 bound, which means tighter.
The overall multilevel framework
We can run the algorithm level by level until merging all the groups into one, and in the final level the pseudo nodes will correspond to the cliques in the original graph, and the maximum value will be exactly . Therefore, our algorithm can be viewed as an anytime algorithm that refines the upper bound levelbylevel until reaching the maximum value. Although getting to the final level still requires exponential time, in practice we can stop at any level (denoted as ) and get a reasonable bound. In experiments, we will show that by merging few trees we already get a bound very close to the final solution. Algorithm 3 in the appendix gives the complete procedure.
Handling multiclass tree ensembles
For a multiclass classification problem, say class classification problem, groups of tree ensembles and each with trees are built for the classification task; for the th tree in group , prediction outcome is denoted as where is the function that maps the input example to a leaf node of tree in group . The final prediction is given by . Given an input example with groundtruth class and an attack target class , we extract trees for class and class , and flip the sign of all prediction values for trees in group , such that initially
for a correctly classified example. Then, we are back to the binary case with
trees, and can still apply our multilevel framework to obtain a lower bound of for this target attack pair . Robustness of an untargeted attack can be evaluated by taking .3.4 Verification Problems Beyond Ordinary Robustness
The above discussions focus on the decision problem of robustness verification (2). In fact, our approach works for a more general verification problem for any dimensional box :
Is there any such that ?  (8) 
In typical robustness verification settings, is defined to be but in fact we can allow any boxes in our algorithm. For a general , Lemma 1 still holds so all of our algorithms and analysis can go through. The only thing we need to change is to compute the intersection between and each box of leaf node at the first level in Figure 1 to eliminate nodes that have empty intersection with . So robustness verification is just a special case where we remove all the nodes with empty intersection with . For example, if we want to identify a set of unimportant variables, where any change in cannot alter the prediction for a given sample , then we can choose as if and otherwise. Similarly, we can also compute a set of anchor features (similar to [26]) such that once a set of features are fixed, any perturbation outside the set cannot change the prediction.
4 Experiments
We evaluate our proposed method for ensemble tree robustness verification on two tasks: binary and multiclass classification on 10 public datasets including both small and large scale datasets^{1}^{1}1
Our code (XGBoost compatible) is available at
https://github.com/chenhongge/treeVerification.. The statistics of the data sets are shown in the appendix. As we defined in Section 2, is the radius of minimum adversarial perturbation that reflects true model robustness, but is hard to obtain; our method finds that is a lower bound of , which guarantees that no adversarial example exists within radius . A high quality lower bound should be close to . We include the following algorithms in our comparisons:
[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt,leftmargin=*]

Cheng’s attack [12] provides results on adversarial attacks on these models, which gives an upper bound of the model robustness . We denote it as and .

MILP: an MILP based method [18] gives the exact . MILP is proposed for adversarial attacks on tree ensembles and thus always holds for all lower bounds. MILP needs to solve a Mixed Integer Linear Program to find and runs in exponential time, and is slow when the number of trees or dimension of the features increases.

LP relaxed MILP: a Linear Programming (LP) relaxed MILP formulation by directly changing all binary variables to continuous ones. Since the constraints are removed, solving the minimization of MILP gives a lower bound of robustness,
, serving as a baseline method.
In Tables 1 and 2 we show empirical comparisons on these 10 datasets. We consider robustness, and normalize our datasets to such that perturbations on different datasets are comparable. We include both standard (naturally trained) GBDT models (Table 1) and robust GBDT models (Table 2) in [8]. The robust GBDTs try to optimize the model performance under the worstcase perturbation of input features, which leads to a maxmin saddle point problem when finding the split at each node. All GBDTs are implemented under XGBoost framework [11]. The number of trees in GBDT and parameters used in training GBDT for different datasets are shown in Table 3 in the appendix. Because we solve the decision problem of robustness verification, we use 10 times binary search to find the largest in all experiments, and the reported time is the total time including all binary search trials. We present the average of or over 500 examples. The MILP based method from [18] is an accurate but very slow method; the results marked with an asterisk (“*”) in the table have very long running time and thus we only evaluate 50 examples instead of 500.


Dataset 
Cheng’s attack [12]  MILP [18]  LP relaxation  Ours  Ours vs. MILP  
avg.  avg. time  avg.  avg. time  avg.  avg. time  avg.  avg. time  speedup  


breastcancer 
.221  2.18s  .210  .012s  .064  .009s  2  1  .208  .001s  .99  12X 
covtype  .058  4.76s  s  s  2  3  .022  3.39s  .79  105X  
codrna  .054  2.13s  .035  .485s  .017  .222s  2  3  .033  .059s  .94  8.2X 
diabetes  .064  1.70s  .049  .061s  .015  .026s  3  2  .042  .018s  .86  3.4X 
FashionMNIST  .048  12.2s  s  s  2  1  .012  11.8s  .86  97X  
HIGGS  .015  3.80s  min  min  4  1  .0022  1.29s  .79  3163X  
ijcnn1  .047  2.72s  .030  4.64s  .008  2.67s  2  2  .026  .101s  .87  4.6X 
MNIST  .070  11.1s  s  s  2  2  .011  5.14s  1.00  71X  
webspam 
.027  5.83s  .00076  47.2s  .0002  39.7s  2  1  .0005  .404s  .66  117X 
MNIST 2 vs. 6  .152  12.0s  .057  23.0s  .016  11.6s  4  1  .046  .585s  .81  39X 





Dataset 
Cheng’s attack [12]  MILP [18]  LP relaxation  Ours  Ours vs. MILP  
avg.  avg. time  avg.  avg. time  avg.  avg. time  avg.  avg. time  speedup  


breastcancer 
.404  1.96s  .400  .009s  .078  .008s  2  1  .399  .001s  1.00  9X 
covtype  .079  .481s  s  s  2  3  .032  4.84s  .70  63X  
codrna  .062  2.02s  .055  .607s  .017  .410s  2  3  .052  .104s  .95  5.8X 
diabetes  .137  1.52s  .112  .034s  .035  .013s  3  2  .109  .006s  .97  5.7X 
FashionMNIST  .153  13.9s  min  min  2  1  .071  18.0s  .78  137X  
HIGGS  .023  3.58s  min  min  4  1  .0063  1.41s  .75  2511X  
ijcnn1  .054  2.63s  .036  2.52s  .009  1.26s  2  2  .032  0.58s  .89  4.3X 
MNIST  .367  1.41s  s  s  2  2  .253  12.6s  .96  49X  
webspam 
.048  4.97s  .015  83.7s  .0024  60.4s  2  1  .011  .345s  .73  243X 
MNIST 2 vs. 6  .397  17.2s  .313  91.5s  .039  40.0s  4  1  .308  3.68s  .98  25X 



From Tables 1 and 2 we can see that our method gives a tight lower bound compared to from MILP, while achieving up to X speedup on large models. The running time of the baseline LP relaxation, however, is on the same order of magnitude as the MILP method, but the results are much worse, with . Our proposed method given as Algorithm 3 is a multiscale anytime approach, that is, it can stop at any scale to get a reasonable robustness bound. Figure 3 shows how the tightness of our robustness verification lower bounds changes with different size of clique per level () and different number of levels (). We test on a 20tree standard GBDT model on the diabetes dataset. Here we also show the exact bound by the MILP method. Our verification bound converges to the MILP bound as more levels of clique enumerations are used. Also, when we find larger cliques within each level, the bound becomes tighter.
To show the scalability of our method, we vary the number of trees in GBDT and compare per example running time with MILP method on ijcnn1 dataset in Figure 3. We see that our proposed methods spend much less time on each example compared to the MILP method and their running time grows slower than MILP when the number of trees increases.
In Section 3.4, we showed that the proposed algorithm works for a more general verification problem such as how to identify unimportant features in the data, where any change on those features cannot change the prediction. Here we use MNIST data as an example for this task where we perturb pixels (features) on MNIST test images on both standard and robustly trained multiclass decision trees with depth 8. In Figure 4, yellow pixels cannot change prediction for any perturbation and a darker pixel represents a smaller lower bound of perturbation to change the model output using that pixel. The standard naturally trained model has some very dark pixels compared to the robust model.
References
 [1] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018.

[2]
Osbert Bastani, Yewen Pu, and Armando SolarLezama.
Verifiable reinforcement learning via policy extraction.
In Advances in Neural Information Processing Systems, pages 2494–2504, 2018.  [3] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decisionbased adversarial attacks: Reliable attacks against blackbox machine learning models. In ICLR, 2018.
 [4] Coen Bron and Joep Kerbosch. Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM, 16(9):575–577, 1973.
 [5] Rudy R Bunel, Ilker Turkaslan, Philip Torr, Pushmeet Kohli, and Pawan K Mudigonda. A unified view of piecewise linear neural network verification. In Advances in Neural Information Processing Systems, pages 4790–4799, 2018.
 [6] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
 [7] L Sunil Chandran, Mathew C Francis, and Naveen Sivadasan. Geometric representation of graphs in low dimension using axis parallel boxes. Algorithmica, 56(2):129, 2010.
 [8] Hongge Chen, Huan Zhang, Duane Boning, and ChoJui Hsieh. Robust decision trees against adversarial examples. In ICML, 2019.

[9]
PinYu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and ChoJui Hsieh.
Ead: elasticnet attacks to deep neural networks via adversarial
examples.
In
Thirtysecond AAAI conference on artificial intelligence
, 2018.  [10] PinYu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and ChoJui Hsieh. Zoo: Zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26. ACM, 2017.
 [11] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794. ACM, 2016.
 [12] Minhao Cheng, Thong Le, PinYu Chen, Jinfeng Yi, Huan Zhang, and ChoJui Hsieh. Queryefficient hardlabel blackbox attack: An optimizationbased approach. In ICLR, 2019.
 [13] Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physicalworld attacks on deep learning models. arXiv preprint arXiv:1707.08945, 2017.
 [14] Timon Gehr, Matthew Mirman, Dana DrachslerCohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2018.
 [15] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
 [16] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Blackbox adversarial attacks with limited queries and information. In International Conference on Machine Learning, pages 2142–2151, 2018.
 [17] Kyle D Julian, Shivam Sharma, JeanBaptiste Jeannin, and Mykel J Kochenderfer. Verifying aircraft collision avoidance neural networks through linear approximations of safe regions. arXiv preprint arXiv:1903.00762, 2019.
 [18] Alex Kantchelian, JD Tygar, and Anthony Joseph. Evasion and hardening of tree ensemble classifiers. In International Conference on Machine Learning, pages 2387–2396, 2016.
 [19] Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117. Springer, 2017.
 [20] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
 [21] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and blackbox attacks. arXiv preprint arXiv:1611.02770, 2016.
 [22] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
 [23] Mohammad Mirghorbani and P Krokhmal. On finding kcliques in kpartite graphs. Optimization Letters, 7(6):1155–1165, 2013.
 [24] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pages 506–519. ACM, 2017.
 [25] Charles A Phillips, Kai Wang, Erich J Baker, Jason A Bubier, Elissa J Chesler, and Michael A Langston. On finding and enumerating maximal and maximum kpartite cliques in kpartite graphs. Algorithms, 12(1):23, 2019.
 [26] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: Highprecision modelagnostic explanations. In ThirtySecond AAAI Conference on Artificial Intelligence, 2018.
 [27] Fred S Roberts. On the boxicity and cubicity of a graph. Recent Progresses in Combinatorics, pages 301–310, 1969.
 [28] Markus Schneider and Burkhard Wulfhorst. Cliques in kpartite graphs and their application in textile engineering. 2002.
 [29] Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, pages 10802–10813, 2018.
 [30] Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019.
 [31] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 [32] Vincent Tjeng, Kai Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356, 2017.
 [33] Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666, 2018.
 [34] Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625, 2018.
 [35] TsuiWei Weng, Huan Zhang, Hongge Chen, Zhao Song, ChoJui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, pages 5273–5282, 2018.
 [36] Eric Wong and J Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, 2018.
 [37] Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems, pages 8400–8409, 2018.
 [38] Huan Zhang, TsuiWei Weng, PinYu Chen, ChoJui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in Neural Information Processing Systems, pages 4939–4948, 2018.
Appendix A Proof of Lemma 1
Proof.
If we have one dimensional intervals , we want to prove if every pair of them have nonempty overlap . This can be proved by the following. Without loss of generality we assume . For each , implies . Therefore, will be a nonempty set that is contained in . Therefore and it is another interval.
This can be generalized to dimensional boxes. Assume we have boxes such that for any and . Then for each dimension we can apply the above proof, which implies that and the intersection will be another box. ∎
Appendix B Data Statistics and Model Parameters in Tables 1 and 2
Table 3 presents data statistics and parameters for the models in Tables 1 and 2 in the main text. The standard test accuracy is the model accuracy on natural, unmodified test sets.


Dataset 
training  test  # of  # of  # of  robust  depth  standard test acc.  
set size  set size  features  classes  trees  robust  natural  robust  natural  


breastcancer 
546  137  10  2  4  0.3  8  6  .978  .964 
covtype  400,000  181,000  54  7  80  0.2  8  8  .847  .877 
codrna  59,535  271,617  8  2  80  0.2  5  4  .880  .965 
diabetes  614  154  8  2  20  0.2  5  5  .786  .773 
FashionMNIST  60,000  10,000  784  10  200  0.1  8  8  .903  .903 
HIGGS  10,500,000  500,000  28  2  300  0.05  8  8  .709  .760 
ijcnn1  49,990  91,701  22  2  60  0.1  8  8  .959  .980 
MNIST  60,000  10,000  784  10  200  0.3  8  8  .980  .980 
webspam  300,000  50,000  254  2  100  0.05  8  8  .983  .992 
MNIST 2 vs. 6  11,876  1,990  784  2  1000  0.3  6  4  .997  .998 



Appendix C An time algorithm for verifying a decision tree
The robustness of a single tree can be easily verified by the following algorithm, which traverse the whole tree and computes the bounding boxes for each node in a depthfirst search fashion.
Appendix D Clique Enumeration Algorithm
The algorithm first looks at any first two parts and of the graph and enumerates all 2cliques in time. Then, each 2clique found is converted into a “pseudo node” (this is possible due to Lemma 1), and all 2cliques form a new part of the graph. Then we replace and with , and continue to enumerate all 2cliques between and to form . A 2clique between and represents a 3clique in , and due to boxicity. Note that enumerating all 3cliques in a general 3partite graph takes time; thanks to boxicity, our algorithm takes time which equals to only when and form a complete bipartite graph, which is unlikely in common cases. This process continues recursively until we process all parts and have only left, where each vertex in represents a clique in the original graph. After obtaining all cliques, we can verify each to compute verification bound.