1 Introduction
Deep Neural Networks (DNNs) have been found to be very successful at many tasks, including mage classification Krizhevsky et al. (2012); He et al. (2016), but have also been found to be quite vulnerable to misclassifications from small adversarial perturbations to inputs Szegedy et al. (2014); Goodfellow et al. (2015). Many defenses have been proposed to protect models from these attacks. Most focus on making a single model robust, but there may be fundamental limits to the robustness that can be achieved by a single model Schmidt et al. (2017); Gilmer et al. (2018); Fawzi et al. (2018); Mahloujifar et al. (2019); Shafahi et al. (2019). Several of the most promising defenses employ multiple models in various ways Feinman et al. (2017); Tramèr et al. (2018); Meng and Chen (2017); Xu et al. (2018); Pang et al. (2019). These ensemblebased defenses work on the general principle that it should be more difficult for an attacker to find adversarial examples that succeed against two or more models at the same time, compared to attacking a single model. However, an attack crafted against one model may be successful against a different model trained to perform the same task. This leads to a notion of joint vulnerability to capture the risk of adversarial examples that compromise a set of models, illustrated in Figure 1. Joint vulnerability makes ensemblebased defenses less effective. Thus, reducing joint vulnerability of models is important to ensure stronger ensemblebased defenses.
Although the above ensemble defenses have shown promise when evaluated against experimental attacks, these attacks often assume adversaries do not adapt to the ensemble defense, and no previous work has certified the joint robustness of an ensemble defense. On the other hand, several recent works have developed methods to certify robustness for single models Wong and Kolter (2018); Tjeng et al. (2019); Gowal et al. (2019). In this work, we introduce methods for providing robustness guarantees for an ensemble of models buidling upon the approaches of Wong and Kolter Wong and Kolter (2018) and Tjeng et al. Tjeng et al. (2019).
Contributions. Our main contribution is a framework to certify robustness for an ensemble of models against adversarial examples. We define three simple ensemble frameworks (Section 3) and provide robustness guarantees for each of them, while evaluating the tradeoffs between them. We propose a novel technique to extend prior work on single model robustness to verify joint robustness of ensembles of two or more models (Section 4). Second, we demonstrate that the costsensitive training approach Zhang and Evans (2019) can be used to train diverse robust models that can be used to certify a high fraction of test examples (Section 5). Our results show that, for the MNIST dataset, we can train diverse ensembles of two, five and ten models using different costsensitive robust matrices. When these diverse models are combined using our ensemble frameworks, the ensembles can be used to certify a larger number of test seeds compared to using a single overallrobust model. For example, 78.1% of test examples can be certified robust for twomodel averaging ensemble and 85.6% for a tenmodel ensemble, compares with 72.7% for a single model. We further show that use of ensemble models do not significantly reduce the model’s accuracy on benign inputs, and when rejection is used as an option, can reduce the error rate to essentially zero with a 9.7% rejection rate.
2 Background and Related Work
In this section, we briefly introduce adversarial examples, provide background on robust training and certification, and describe defenses using model ensembles.
2.1 Adversarial Examples
Several definitions of adversarial example have been proposed. For this paper, we use this definition Biggio et al. (2013); Goodfellow et al. (2015): given a model , an input , a distance metric , and a distance measure , an adversarial example for the input is where and .
In recent years, there has been a significant amount of research on adversarial examples against DNN models, including attacks such as FGSM Goodfellow et al. (2015), DeepFool MoosaviDezfooli et al. (2016), PGD Madry et al. (2018), CarliniWagner Carlini and Wagner (2017), and JSMA Papernot et al. (2016). The FGSM attack works by taking the signs of the gradient of the loss with respect to the input and adding a small perturbation to the direction of loss for all input features. This simple strategy is surprisingly successful. The PGD attack is considered a very strong stateoftheart attack. It is essentially an iterative version of FGSM, where instead of just taking one step many smaller steps are taken subject to some constraints and with some randomization.
One interesting property of these attacks is that the adversarial examples they find are often transferable Evtimov et al. (2018) — a successful attack against one model is often successful against a second model. Transfer attacks enable blackbox attacks where the adversary does not have full access to the target model. More importantly for our purposes, they also demonstrate that an adversarial example found against one model is also effective against other models, so can be effective against ensemblebased defenses Xie et al. (2018); Tramèr et al. (2018). In our work, we consider the threat model where the adversary has whitebox access to all of the models in the ensemble and knowledge of the ensemble construction.
2.2 Robust training
While many proposed adversarial examples defenses look promising, adaptive attacks that compromise defenses are nearly always found Tramer et al. (2020). The failures of ad hoc defenses motivate increased focus on robust training and provable defenses. Madry et al. Madry et al. (2018), Wong et al. Wong and Kolter (2018), and Raghunathan et al. Raghunathan et al. (2018)
have proposed robust training methods to defend against adversarial examples. Madry et al. use the PGD attack to find adversarial examples with high loss value around training points, and then iteratively adversarially train their models on those seeds. Wong et al. define an adversarial polytope for a given input, and robustly train the model to guarantee adversarial robustness for the polytope by reducing the problem into a linear programming problem. These works focus on single models; we propose a way to make
ensemble models jointly robust through training a set of models to be both robust and diverse.2.3 Certified robustness
Several recent works aim to provide guarantees of robustness for models against constrained adversarial examples Wong and Kolter (2018); Tjeng et al. (2019); Raghunathan et al. (2018); Cohen et al. (2019). All of these works provide certification for individual models. A model is certifiably robust for an input , if for all where , is robust. We extend these techniques for ensemble models. In particular, we extend Tjeng et al.’s Tjeng et al. (2019) MIP verification technique and Wong et al.’s Wong and Kolter (2018) convex adversarial polytope method. Both techniques are based on using linear programming to calculate a bound on outputs given the allowable input perturbations, and using those output bounds to provide robustness guarantees. MIPVerify uses mixed integer linear programming solvers, which are computationally very expensive for deep neural networks. To get around this issue, Wong et al. Wong and Kolter (2018)
use a dual network formulation of the original network that overapproximates the adversarial region, and apply widely used techniques such as stochastic gradient descent to the solve the optimization problem efficiently. This can scale to larger networks and provides a sound certificate, but may fail to certify robust examples because of the overapproximation.
2.4 Ensemble models as defense
In classical machine learning, there has been extensive work on ensemble of models and also diversity measures. Kuncheva
Kuncheva and Whitaker (2003) provides a comparison of those measures and their usefulness in terms of ensemble accuracy. However, both the diversity measures and the evaluation of their usefulness was done in the benign setting. The assumptions that are valid in the benign setting, such as, the independent and identically distributed inputs no longer applies in the adversarial setting.In the adversarial setting, there have been several proposed ensemblebased defenses Feinman et al. (2017); Tramèr et al. (2018); Pang et al. (2019) that work on the principle of making models diverse from each other. Feinman et al. Feinman et al. (2017) use randomness in the dropout layers to build an ensemble that is robust to adversarial examples. Tramer et al. Tramèr et al. (2018) use ensembles to introduce diversity in the adversarial examples on which to train a model to be robust. Pang et al. Pang et al. (2019)
promote diversity among nonmaximal class prediction probabilities to make the ensembles diverse. Sharif et al.
Sharif et al. (2019) proposed the n–ML appproach for adversarial defense. They explicitly train the models in the ensemble to be diverse from each other, and show experimentally that it leads to robust ensembles. Similarly, Meng et al. Meng et al. (2020) have shown than an ensemble of weak but diverse models can be used as strong adversarial defense. While all of the above works focus on making models diverse, they evaluate their ensembles using existing attack methods. None of these prior works have attempted to provide any certification of their diverse ensemble models against adversaries.3 Ensemble Defenses
Our goal is to provide robustness guarantees against adversarial examples for an ensemble of models. The effectiveness of a ensemble defense depends on the models used in the ensemble and how they are combined.
First, we present a general framework for ensemble defenses. Next, we define three different ensemble composition frameworks: unanimity, majority, and averaging. Section 4 describes the techniques we use to certify each type of ensemble framework. In Sections 5.2 and 5.3, we talk about different ways to train the individual models in these ensemble frameworks, and discuss the results. Our methods do not make any assumptions about the models in an ensemble, for example, that they are preprocessing the input in some way and then running the same model. This means our frameworks are general purpose and agnostic of the input domain, but we cannot handle ensemble mechansisms that are nondeterministic (such as sampling Gaussian noise around the input Salman et al. (2020), which can only provide probabilistic guarantees).
General frame of ensemble defense. We use to represent the output of a model ensemble, composed of models, that are composed using one of the composition mechanisms. Furthermore, given an input , true output class , and the output of the ensemble , we use a decision function to decide whether the given input is adversarial, benign, or rejected. Functions and together define an ensemble defense framework. We discuss three such frameworks in this paper.
Unanimity. In the unanimity framework, the output class is only if all of the component models output . If there is any disagreement among the models, the input is rejected: . For the unanimity framework, joint robustness is achieved when the unanimityrobust property defined below is satisfied.
Definition 3.1.
Given an input with true output class and allowable adversarial distance , we call a model ensemble unanimityrobust for input if there exists no adversarial example such that , and .
Majority. In the majority framework, the output class is only if at least models agree on it. If there is no majority output class, the input is rejected. Joint robustness is achieved when the majorityrobust property defined below is satisfied:
Definition 3.2.
Given an input with true output class , allowable adversarial distance , a model ensemble is majorityrobust for input if there exists no adversarial example such that , and there is no class such that and
Averaging. In the averaging
framework, we take the average of the second last layer output vectors of each of component models to produce the final output. This second last layer vector is typically a softmax or logits layer. We use
to denote the second last layer output vector of model , and define the average of the second last layer vectors as:Then, the output of the ensemble is:
Joint robustness for an averaging ensemble is satisfied when the averagingrobust property defined below is satisfied.
Definition 3.3.
Given an input with true output class , allowable adversarial distance , we call a model ensemble averagingrobust if there exists no adversarial example such that , and there is no class such that and , where is as defined above.
4 Certifying ensemble defenses
In this section we introduce our techniques to certify a model ensemble is robust for a given input. Our approach extends the single model methods of Wong and Kolter Wong and Kolter (2018) and Tjeng et al. Tjeng et al. (2019) to support certification for model ensembles using the different composition mechanisms.
4.1 Unanimity and majority frameworks
The simplest approach for certifying joint robustness for the unanimity and majority frameworks would be to certify the robustness of each model in the ensemble individually for a given input, and then make a joint certification decision based on those individual certifications. This strategy is simple but prone to false negatives.
For the unanimity framework, we can verify that an ensemble is unanimityrobust for input if at least one of the models is individually robust for . This provides a simple way to use singlemodel certifiers to verify robustness for an ensemble, but is stricter than what is required to satisfy Definition 3.1 since compromising a unanimity ensemble requires finding a single input that is a successful adversarial example against every component model. Hence, this method may substantially underestimate the actual robustness, especially when the component models have mostly disjoint vulnerability regions. An input that cannot be certified using the this technique, may still be unanimityrobust. Nevertheless, this technique is an easy way to establish a lower bound for joint robustness.
Similarly, for the majority framework, we can use this approach to verify that an ensemble satisfies majorityrobustness (Definition 3.2) by checking if at least models are individually robust for input . As with the unanimity case, this underestimates the actual robustness, but provides a valid joint robustness lower bound. As we will see in Section 5.3, the independent evaluation strategy works fairly well for the unanimity framework, but it is almost useless for the majority framework when the number of models in the ensembles gets large.
4.2 Averaging models
As the averaging framework essentially obtains a single model by combining the models in the ensemble, we can simply apply the single model certification techniques to that to achieve robust certification. This gives us robust certification according to Definition 3.3. Furthermore, we can show that this certification technique implies a certification guarantee for the unanimity framework. In fact, the certification guarantee for the unanimity framework achieved this way has lower false negative rate than the independent technique described in the previous subsection. We state this formally in Theorem 4.1, and provide a proof below.
Theorem 4.1.
If for a given input , the averaging ensemble is certified to be robust, then the component models combined with the unanimity framework is also certifiably robust according to Definition 3.1.
Proof.
Let be the component models of the averaging ensemble, and let be the second last layer output vectors for each of these models. As described in section 3, given input we can define the average of the second last layer outputs as:
The final output class is defined as:
For input , let be the true output class and
be the target output class. Now, if averaging ensemble
is robust, we can write . It follows that . Thus,This implies that either or . Generalizing for any , we must have . Thus, in , an unanimity ensemble of is unanimityrobust for target class according to Definition 3.1. Thus, if we can show that a model ensemble is averagingrobust for all target classes for an input , then it implies that the unanimity ensemble formed with models is also unanimityrobust for input . ∎
This is again a stricter definition of robustness compared to the unanimityrobustness defined in Definition 3.1
. This means, even though certification of averagingrobustness implies unanimityrobustness, the opposite is not true. That is, unanimityrobustness does not imply averagingrobustness. Therefore, we again get a lower bound of unanimityrobustness. However, the averagingrobustness is a less strict definition of robustness than the implicit independent certification definition described in the previous subsection. Thus, this formulation gives us a better estimate of true unanimityrobustness.
In this project, we extend two different singlemodel certification techniques to provide robustness certification for ensembles. The two different techniques we use are described below:
Using MIP verification: Tjeng et al. Tjeng et al. (2019) have used mixed integer programming (MIP) techniques to evaluate robustness of models against adversarial examples. We apply their certification technique on our averaging ensemble model to certify the joint robustness of . However, we found this approach to be computationally intensive, and it is hard to scale to larger models. Nevertheless we found some interesting results for two very simple MNIST models which we report in the next section.
Using convex adversarial polytope: In order to scale our verification technique to larger models, we next extended the dual network formulation by Wong and Kolter Wong and Kolter (2018) to be able to handle the final averaging layer of the averaging ensemble model . Because this layer is a linear operation, it can be simulated using a fully connected linear layer in the neural network. And because linear networks are already supported by their framework, our averaging model can thus be verified.
5 Experiments
This section reports on our experiments extending two different certification techniques, MIPVerify (Section 5.2) and convex adversarial polytope (Section 5.2), for use with model ensembles in different frameworks. To conduct the experiments, we produced a set of robust models that are trained to be diverse in particular ways (Section 5.1) and can be combined in various ensembles. Because of the computational challenges in scaling these techniques to large models, most of our results are only for the convex adversarial polytope method and for now we only have experimental results on MNIST. Although this is a simple dataset, and may not be representative of typical tasks, it is sufficient for exploring methods for testing joint vulnerability, and for providing some insights into the effectiveness of different types of ensembles.
5.1 Training Diverse Robust Models
To train the models in the ensemble frameworks, we used the costsensitive robustness framework by Zhang et al. Zhang and Evans (2019), which is implemented based on the convex adversarial polytope work. Costsensitive robustness provides a principled way to train diverse models.
Costsensitive robust training uses a costmatrix to specify seedtarget class pairs that are trained to be robust. If is the cost matrix, is a seed class, and is a target class, then is set when we want to make the trained model robust against adversarial attacks from seed class to target class , and is set when we don’t want to make the model robust for this particular seedtarget pair. For the MNIST dataset, is a matrix. We configure this cost matrix in different ways to produce different types of model ensembles. This provides a controlled way to produce models with diverse robustness properties, in contrast to adhoc diverse training methods that vary model architectures or randomize aspects of training. We expect both types of diversity will be useful in practice, but leave exploring adhoc diversity methods to future work.
We conduct experiments on ensembles of two, five, and ten models, trained using different cost matrices. The different ensembles we used are listed below:

Two model ensembles where individual models are:

Even seed digits robust and odd seed digits robust.

Even target digits robust and odd target digits robust.

Adversariallyclustered seed digits robust.

Adversariallyclustered target digits robust.


Five model ensembles with individual models that are:

Seed digits modulo5 robust.

Target digits modulo5 robust.

Adversariallyclustered seed digits robust.

Adversariallyclustered target digits robust.


Ten model ensembles:

Seed digits robust models.

Target digits robust models.

A representative selection of different models we use are described in Table 1. The overall robust model is a single model trained to be robust on all seedtarget pairs (this is the same as standard certifiable robustness training using the convex adversarial polytope). The other models were trained using different costmatrices. These costmatrices are shown in Table 1. All these models had the same architecture, and they were trained on distance of 0.1. Each model had 3 linear and 2 convolutional layers.
Cost Matrix  Overall Certified  CostSensitive  

Model  ()  Robust Accuracy  Robust Accuracy 
Overall Robust  %  %  
Evenseeds Robust  %  %  
Oddtargets Robust  %  %  
Seeds (2,3,5,6,8) Robust  %  %  
Targets (0,1,4,7,9) Robust  %  %  
Seedmodulo5 = 0 Robust  %  %  
Targetmodulo5 = 3 Robust  %  %  
Seeds (3,5) Robust  %  %  
Targets (1,7) Robust  %  %  
Seedmodulo10 = 3 Robust  %  %  
Targetmodulo10 = 7 Robust  %  % 
The adversarial clustering was done to ensure digits that appear visually most similar to each other are grouped together. This similarity between a pair of digits was measured in terms of how easily either digit of the pair can be adversarially targeted to the other digit. These results are consistent with our intuitions about visual similarity — for example, MNIST digits 2, 3, 5, 8 are visually quite similar, and we also found them to be adversarially similar, hence clustered together.
5.2 Certifying using MIPVerify
We used the MIP verifier on two shallow MNIST networks. One of the networks had two fullyconnected layers, and the other had three fullyconnected layers. The twolayer network was trained to be robust on evenseeds and the threelayer network on oddseeds. We used adversarial training using PGD attacks to robustly train the models. Even with adversarial training, however, the models were not really robust. Even at perturbation of , which is very low for MNIST dataset, the models only had robust accuracy of 23% and 28% respectively. The reason for the lack of robustness is because the networks were very shallow and lacked a convolutional layer. We could not make the models more complex because doing so makes the robust certification too performanceintensive. Still, even with these nonrobust models, we can see some interesting results for the ensemble of the two models. We discuss them below.
To understand the robustness possible by constructing an ensemble of the two models, we compute the minimal adversarial perturbation for 100 test seeds for the two single networks and the ensemble average network built from them. We used distance because the MIP verifier performs better with this, due its linear nature, compared to or distances. More than 90% of the seeds were verified within 240 seconds. Figure 2 shows the number of seeds that can be proven robust using MIP verification at a given distance for each model independently, the maximum of the two models, and the ensemble average model. The verifier was not always able to find the minimal necessary perturbation for the ensemble network within the time limit of 240 seconds. In those cases, we reported the maximum adversarial distance proven to be safe at the time when time limit exceeded – which respresents an upper bound of minimal adversarial perturbation. We note from the figure that number of examples certified by the ensemble average model is higher than that for either individual model at all minimal distances.
In general, though, we found the MIP Verificaton does not scale well with networks complex enough to be useful in practice. Deeper networks and use of convolutional layers makes the performance of MIP Verify significantly worse. Furthermore, we found that robust networks were harder to verify than nonrobust networks with this framework. Because of this, we decided not to use this approach for the remaining experiments which is more practical networks.
5.3 Convex adversarial polytope certification
As the MIP verification does not scale well to larger networks, for our remaining experiments we use the convex adversarial polytope formulation by Wong et al. Wong and Kolter (2018). We conduct experiments with ensembles of two, five, and ten models, using the models described in Table 1. Table 2 summarizes the results.
Joint robustness of twomodel ensembles. We evaluated twomodel ensembles with different choices for the models, using the three composition methods. We ensured that the averaging ensemble could be treated as a single sequential model made of fullyconnected linear layers, so that the robust verification formulation was still valid when applied on it. To do this, we had to first convert the convolutional layers of the single models into linear layers, and then the linear layers of the two models were combined to create larger linear layers for the joint model. We can then calculate the robust error rates of the ensemble average model, as well as the unanimity and majority ensembles for the twomodel ensemble. The key here is that no changes were needed to be made to the existing verification framework.
Table 2 shows each ensemble’s robust accuracy. For twomodel ensembles, the unanimity and the majority frameworks are the same. Thus we can use the same ensemble average technique to certify them. For adversarial clustering into 2models, we used two clusters – one for digits (2, 3, 5, 6, 8) and the other for digits (0, 1, 4, 7, 9).
Compared to the single overall robust model, where 72.7% of the test examples can be certified robust, with twomodel ensembles we can certify up to 78.1% of seeds as robust (using the averaging composition with the adversarially clustered seed robust models).
Models  Composition  Certified Robust  Normal Test Error  Rejection 

Overall Robust  Single  72.7%  5.0%   
Even/Oddseed  Unanimity  74.7%  1.3%  5.0% 
Average  75.9%  3.3%    
Clustered seed (2)  Unanimity  77.3%  1.5%  6.0% 
Average  78.1%  3.0%    
Seedmodulo5  Unanimity  84.1%  0.3%  8.1% 
Average  85.3%  1.7%    
Clustered seed (5)  Unanimity  83.8%  0.7%  7.1% 
Average  84.3%  1.4%    
Seedmodulo10  Unanimity  85.4%  0.1%  9.7% 
Average  85.6%  1.5%   
We reran all the above experiments for all values from 0.01 to 0.20 to see how the joint robustness changes as the attacks get stronger. Figure 3(a) shows the results from the adversarially clustered seeds twomodel ensemble; the results for the other ensembles show similar patterns and are deferred to Appendix A. For values up to 0.1, which is the value used for training the robust models, the ensemble model is able to certify more seeds compared to the single overall robust model. We also note that the models that are trained to be targetrobust, rather than seedrobust, perform much worse. With even and odd targetrobust models we were able to certify only 35.6% of test examples. We believe the reason for this is that the evaluation criteria of robustness is inherently biased against models that are trained to targetrobust. Because, when evaluating, we always start from some test seed, and try to find an adversarial example from that seed – which is not what the targetrobust models are explicitly trained to prevent.
Fivemodel Ensembles. Our joint certification framework can be extended to ensembles of any number of models. We trained the five models to be robust on modulo5 seed digits. Ensembles of these models had better certified robustness than the best twomodel ensembles. For example, with averaging composition 85.3% of test examples can be certified robust (compared to our previous best result of 78.1% with two models). Figure 3(b) shows how the number of certifiable test seeds drops with increasing , but worth noting is the large gap between any individual model’s certifiable robustness and that for the average ensemble. We also trained model by adversarially clustering into 5models – for digits (4, 9), (3, 5), (2,8), (0, 6) and (1, 7). For the clustered seed robust ensemble, the results were slightly worse (84.3%) than modulo5 seeds robust model. One difference between the twomodel and fivemodel ensembles is that in the latter, the unanimity and the majority frameworks are different. We found that independent certification does not really work for majority framework. We were able to certify almost no test seeds for the majority framework for fivemodel ensembles.
Tenmodel Ensembles. Finally, we tried ensembles of ten models, each trained to be robust for a selected seed digit (
). The certified robust rate of the 10model ensemble trained to be seed robust was 85.6%. This is slightly higher than the 5model ensemble (85.3%), but perhaps not worth the extra performance cost. It is notable, though, that the unanimity model reduces the normal test error for this ensemble to 0.1%. This means that out of 1000 test seeds, 853 were certified to be robust, 48 were correctly classified but could not be certified, 97 were rejected due to disagreement among the models, and 1 was incorrectly classified by all 10 models. Figure
4 shows the one test example where all models agree on a predicted class but it is not the given label (Figure 4(a), and selected typical rejected examples from the 97 tests where the models disagree (Figure 4(b)).

Summary. Figure 5 compares the robust certification rate for the two, five, and tenmodel ensembles. Clustered seed robust models generally tend to perform well, although just random modulo seed robust models perform almost just as well.
One potential issue with any ensemble models is the possibility of false positives. In our case, the use of multiple models in the unanimity and majority frameworks also introduce the possibility of rejecting benign inputs. As the number of models in a unanimity ensemble increases, the rejection rate on normal inputs increases since if any one model disagrees the input is rejected. However, if the false rejection rate is reasonably low, then in many situations that may be an acceptable tradeoff for higher adversarial robustness. The results in Table 2 are consistent with this, but show that even the tenmodel unanimity ensemble has a rejection rate below 10%. For more challenging classification tasks, strict unanimity composition may not be an option if rejection rates become unacceptable, but could be replaced by relaxed notions (for example, considering a set of related classes as equivalent for agreement purposes, or allowing some small fraction of models to disagree).
6 Conclusion
We extended robust certification models designed for single models to provide joint robustness guarantees for ensembles of models. Our novel jointmodel formulation technique can be used to extend certification frameworks to provide certifiable robustness guarantees that are substantially stronger than what can be obtained using the verification techniques independently. Furthermore, we have shown that costsensitive robustness training with diverse cost matrices can produce models that are diverse with respect to joint robustness goals. The results from our experiments suggest that ensembles of models can be useful for increasing the robustness of models against adversarial examples. These is a vast space of possible ways to train models to be diverse, and ways to use multiple models in an ensemble, that may lead to even more robustness. As we noted in our motivation, however, without efforts to certify joint robustness, or to ensure that models in an ensemble are diverse in their vulnerability regions, the apparent effectiveness of an ensemble may be misleading. Although the methods we have used cannot yet scale beyond tiny models, our results provide encouragement that ensembles can be constructed that provide strong robustness against even the most sophisticated adversaries.
Availability
Open source code for our implementation and for reproducing our experiments is available at: https://github.com/jonasmaj/ensembleadversarialrobustness.
Acknowledgements
We thank members of the Security Research Group, Mohammad Mahmoody, Vicente Ordóñez Román, and Yuan Tian for helpful comments on this work, and thank Xiao Zhang, Eric Wong, and Vincent Tjeng, Kai Xiao, and Russ Tedrake for their open source projects that we made use of in our experiments. This research was sponsored in part by the National Science Foundation #1804603 (Center for Trustworthy Machine Learning, SaTC Frontier: EndtoEnd Trustworthiness of MachineLearning Systems), and additional support from Amazon, Google, and Intel.
References
 [1] (2013) Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases, Cited by: §2.1.
 [2] (2017) Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, Cited by: §2.1.
 [3] (2019) Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, Cited by: §2.3.

[4]
(2018)
Robust physicalworld attacks on deep learning models
. InConference on Computer Vision and Pattern Recognition
, Cited by: §2.1.  [5] (2018) Adversarial vulnerability for any classifier. In Conference on Neural Information Processing Systems, Cited by: §1.
 [6] (2017) Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410. Cited by: §1, §2.4.
 [7] (2018) Adversarial spheres. arXiv preprint arXiv:1801.02774. Cited by: §1.
 [8] (2015) Explaining and harnessing adversarial examples. In International Conference on Learning Representations, Cited by: §1, §2.1, §2.1.
 [9] (2019) Scalable verified training for provably robust image classification. In International Conference on Computer Vision, Cited by: §1.
 [10] (2016) Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1.
 [11] (2012) Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Cited by: §1.
 [12] (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51 (2), pp. 181–207. Cited by: §2.4.
 [13] (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, Cited by: §2.1, §2.2.

[14]
(2019)
The curse of concentration in robust learning: evasion and poisoning attacks from concentration of measure.
In
AAAI Conference on Artificial Intelligence
, Cited by: §1.  [15] (2017) MagNet: a twopronged defense against adversarial examples. In ACM Conference on Computer and Communications Security, Cited by: §1.
 [16] (2020) Ensembles of many diverse weak defenses can be strong: defending deep neural networks against adversarial attacks. arXiv preprint arXiv:2001.00308. Cited by: §2.4.
 [17] (2016) DeepFool: a simple and accurate method to fool deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.1.
 [18] (2019) Improving adversarial robustness via promoting ensemble diversity. arXiv preprint arXiv:1901.08846. Cited by: §1, §2.4.
 [19] (2016) The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy, Cited by: §2.1.
 [20] (2018) Certified defenses against adversarial examples. In International Conference on Learning Representations, Cited by: §2.2, §2.3.
 [21] (2020) Blackbox smoothing: a provable defense for pretrained classifiers. arXiv:2003.01908. Cited by: §3.
 [22] (2017) Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, Cited by: §1.
 [23] (2019) Are adversarial examples inevitable?. In International Conference on Learning Representations, Cited by: §1.
 [24] (2019) ML: mitigating adversarial examples via ensembles of topologically manipulated classifiers. arXiv preprint arXiv:1912.09059. Cited by: §2.4.
 [25] (2014) Intriguing properties of neural networks. In International Conference on Learning Representations, Cited by: §1.
 [26] (2019) Evaluating robustness of neural networks with Mixed Integer Programming. In International Conference on Learning Representations, Cited by: §1, §2.3, §4.2, §4.
 [27] (2020) On adaptive attacks to adversarial example defenses. arXiv:2002.08347. Cited by: §2.2.
 [28] (2018) Ensemble adversarial training: attacks and defenses. In International Conference on Learning Representations, Cited by: §1, §2.1, §2.4.
 [29] (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, Cited by: §1, §2.2, §2.3, §4.2, §4, §5.3.
 [30] (2018) Mitigating adversarial effects through randomization. In International Confernce on Learning Representations, Cited by: §2.1.
 [31] (2018) Feature Squeezing: detecting adversarial examples in deep neural networks. In Network and Distributed Systems Security Symposium, Cited by: §1.
 [32] (2019) Costsensitive robustness against adversarial examples. In International Conference on Learning Representations, Cited by: §1, §5.1.