## 1 Introduction

The lack of robustness in deep neural networks (DNNs) has motivated recent research on verifying and improving the robustness of DNN models (katz2017reluplex; dvijotham2018dual; gehr2018ai; singh2018fast; madry2018towards; raghunathan2018semidefinite; wong2018provable; zhang2018crown). Improving the robustness of neural networks, or designing the “defense” to adversarial examples, is a challenging problem. athalye2018obfuscated; uesato2018adversarial showed that many proposed defenses are broken and do not significantly increase robustness under adaptive attacks. So far, state-of-the-art defense methods include adversarial training (madry2018towards; sinha2018certifying) and optimizing a certified bound on robustness (wong2018provable; raghunathan2018certified; wang2018mixtrain; mirman2018differentiable), but there is still a long way to go to conquer the adversarial example problem. On MNIST at perturbation , adversarially trained models (madry2018towards) are resistant to strong adversarial attacks but cannot be efficiently certified using existing neural network verification techniques. On the other hand, certifiable training methods usually suffer from high clean and verified error; in (wong2018scaling), the best model achieves 43.1% verified error and 14.9% clean error, which is much higher than ordinary MNIST models.

Most of these existing defense methods only focus on improving the robustness of a single model. Traditionally, a model ensemble has been used to improve prediction accuracy of weak models. For example, voting or bootstrap aggregating (bagging) can be used to improve prediction accuracy in many occasions. Furthermore, boosting based algorithms, including AdaBoost (freund1997decision), LogitBoost (friedman2000additive)

(friedman2001greedy; friedman2002stochastic) and many other variants, are designed to minimize an upper bound (surrogate loss) of classification error, which provably increases the model’s accuracy despite the fact that each base model can be very weak. Inspired by the successful story of model ensembles, our question is that if a similar technique can be used to build a robust model with better*provable*robustness via an ensemble of certifiable base models?

Intuitively, attacking an ensemble seems to be harder than attacking a single model, because an adversary must fool all models simultaneously. Some works on using a model ensemble to defend against adversarial examples (abbasi2017robustness; strauss2017ensemble; liu2018towards; pang2019improving; kariyappa2019improving) show promising results that they can indeed increase the required adversarial distortion for a successful attack and improve robustness. However, none of these works attempt to propose a *provable* (or *certifiable*) method to improve model robustness via an ensemble, so there is no guarantee that these methods work in all situations. For example, he2017adversarial reported that attacking a specialist ensemble only increases the required adversarial distortion by as little as 6% compared to a single model.

In this paper, we propose a new algorithm, RobBoost

, that can provably enhance the robustness certificate of a deep model ensemble. First, we consider the setting where a set of pretrained robust models are given, and we aim to find the optimal weights for each base classifier that maximize provable robustness. We select the weight for each base model iteratively, to maximize a lower bound on the classification margin of the neural network ensemble classifier. Given a set of

base models which are individually certifiable, we formulate a certified robustness bound for the model ensemble and show that solving the optimal ensemble leads to an optimization problem. We propose a coordinate descent based algorithm to iteratively solve the RobBoost objective. Second, we consider training each base classifier sequentially from scratch, in a setting more similar to traditional gradient boosting, where each model is sequentially trained to improve the overall robustness of the current ensemble. Our experiments show that RobBoost can select a set of good base classifiers and weight them optimally, outperforming a naive average of all base models; on MNIST with perturbation of , RobBoost reduces verified error from (averaging models) to using 12 certifiable base models trained individually.## 2 Related Work

### 2.1 Neural network robustness verification

The robustness of a neural network can be verified by analyzing the reachable range of an output neuron for

*all*possible inputs within a set (for example, a perturbed image with bounded norm), thus the margins of predictions between the top-1 class and other classes can be examined. Unfortunately, finding the exact reachable range is NP-complete (katz2017reluplex)

and is equivalent to a mixed integer linear programming problem

(tjeng2017evaluating; xiao2018training). Therefore, many recent works in robustness verification develop computationally tractable ways to obtain outer bounds of reachable ranges. As robustness verification can be cast into a non-convex minimization problem (ehlers2017formal), one approach is to resort to duality and give a lower bound of the solution (dvijotham2018dual; qin2018verification; dvijotham18verification). Also, we can relax the primal optimization problem with linear constraints, and obtain a lower bound of the original problem using linear programming (LP) or the dual of LP (wong2018provable; wong2018scaling). However, solving LPs for a relatively large network can still be quite slow (salman2019convex). Fortunately, the relaxed LP problem can be solved greedily in the primal space (zhang2018crown; weng2018towards; wang2018efficient) or in the dual space (wong2018provable). “Abstract transformers” (singh2018fast; singh2019abstract; gehr2018ai; mirman2018differentiable; Singh2019robustness) propagate an abstraction of input regions layer by layer to eventually give us bounds on output neurons. See (salman2019convex; liu2019algorithms) for a comprehensive discussion on the connections between these algorithms.Our work relies on the neural network outer bounds proposed in (zhang2018crown; weng2018towards; wong2018provable), which are the state-of-the-art methods to efficiently give an upper and a lower bound for an output neuron given an -norm bounded input perturbation. These bounds are essentially linear with respect to input perturbations, which is crucial for developing a tractable framework (see Proposition 3.1 for more details). Besides these methods, local Lipschitz constants can also be used for efficiently giving a formal robustness guarantee (hein2017formal; zhang2018recurjac); tighter bounds can be obtained by semi-definite programming (SDP) based methods (raghunathan2018certified; raghunathan2018semidefinite; dvijothamefficient2019), although they scale much poorly to larger models.

### 2.2 Defending against adversarial examples

We categorize existing defending techniques into two categories: certified defenses that can provably increase the robustness of a model, and empirical

defenses that are mostly based on heuristics that have not been proven to improve robustness with a formal guarantee. Empirical defenses, even sophisticated ones, can possibly be evaded using stronger or adaptive attacks

(athalye2018obfuscated; carlini2017adversarial; carlini2017towards; carlini2017magnet). We mostly focus on certified defenses in our paper, since our proposed method requires certified base models to create a certifiable ensemble.Since many robustness verification methods give us a lower bound on the output margin between the ground-truth class and other classes under norm bounded input perturbation, optimizing a loss containing this bound obtained by a differentiable verification method will lead to maximizing this margin and improving *verified error* on the training set.
For example, wong2018scaling; wong2018provable propose to minimize the verified error through a cross-entropy loss surrogate using LP relaxation based verification bounds; wang2018mixtrain optimize a similar verification bound; dvijotham2018training proposes to learn the dual variables in dual relaxations using a learner network during training;
mirman2018differentiable

propose an differentiable version of abstract transformers and include the obtained bounds in loss function to provably increase robustness.

raghunathan2018certified propose to control the global Lipschitz constant of a 2-layer network to give a certified robustness bound, but it cannot be easily extended to multiple layers.Previous *certified* defenses mostly focus on improving the robustness of a single model. In wong2018scaling, multiple models are considered in a cascaded manner, where each subsequent model is trained using the examples that cannot be certified by previous models. In inference time, only the last cascaded model is considered. RobBoost works in a different scenario, where we consider how to combine certified base classifiers to a stronger one with better verified error. Empirical defenses using ensembles include (pang2019improving; kariyappa2019improving; abbasi2017robustness; strauss2017ensemble), however they do not provide provable robustness guarantees.

## 3 The RobBoost Algorithm

#### Notations.

We define a -class -layer feed-forward classification neural network as

where and layer ’s weight matrix is

and bias vector is

. Input and for convenience we define .is a component-wise activation function. For convenience, we denote the row vector

as the -th row of matrix , column vector as the -th column of matrix . Additionally, We define the ball centered at with radius as , . We use to denote the set . We denote a training example as a pair , where , is the class label, and is the total number of training examples.### 3.1 Linear outer bounds for neural networks

We start with guaranteed linear upper and lower bounds for a single neural network :

###### Proposition 3.1 (Linear outer bounds of neural networks).

A neural network function can be linearly upper and lower bounded for all , where :

(1) |

where and depend on , , .

Proposition 3.1 is a direct consequence of Theorem 3.2 in CROWN (zhang2018crown), which gives the explicit form of as a function of neural network parameters. A similar outcome can be obtained from the neural network verification literature (wong2018provable; singh2018fast; weng2018towards; wang2018efficient), but CROWN typically gives the tightest bound. We assume that are pre-computed for each example for a given using CROWN or other similar algorithms.

To verify if we can change the output of the network from the ground-truth class to another class , we desire to obtain a lower bound on the margin . For a training example , we define the margin between the true class and other classes as a vector function , where , for and for , . To obtain a lower bound of margin, we define the new network with the same weights and biases as , except that the last layer of is reformed as:

(2) |

(3) |

is a mapping for that skips the class (ground truth class). Then we can lower bound the margins by Proposition 3.1:

where and implicitly depend on , and .
For *every* , , the following bound holds^{1}^{1}1As in zhang2018crown and many other works, we illustrate unbounded perturbation (not limited to ) to give the lower bound. Using bounded perturbation on MNIST typically improves verified error by 1-2%.:

(4) |

where is the dual norm of . When for *all* , we cannot change the classification of from class to any other class, thus no adversarial examples exist. The -norm of plays an important rule in the model’s robustness. We define the margin for target class based on the guaranteed lower bound (4):

(5) |

With this lower bounds on margin, we can extend the definition of ordinary classification error and define *verified error*, which is a provable upper bound on error under *any* norm bounded attacks:

###### Definition 3.2 (Robustness Certificate and Verified Error).

Given a perturbation radius where an input can be perturbed arbitrarily within , verified error is the percentage of examples that do *not* have a provable robustness certificate:

(6) |

In our paper we also use a weaker definition of verified error (which can be bounded using a surrogate loss, which will be presented in Section 3.3), where we consider each attack target individually:

(7) |

### 3.2 Linear outer bounds for an ensemble

We denote as the margin for a base model . We first note that the lower bound on margin is unnormalized and not scale-invariant. Enlarging the output of by a constant factor does not affect the robustness of , but the margin will also be scaled correspondingly. Directly maximizing this unnormalized margin does not lead to better robustness. Intuitively, the most important factor of the robustness of model is the available budget (reflected as the term ) divided by the sensitivity with respect to the input (reflected as ), rather than the absolute value of the margin. When we use this lower bound on margins as a surrogate to compare the robustness across different models, their margins should be within a similar range to make this comparison meaningful. To take this into account, we can normalize by dividing it with a normalizing factor , the average over all examples and classes. The normalized lower bound on margin is defined as:

(8) | ||||

This is equivalent to applying a constant factor . The normalized model, denoted as , will be used for our ensemble. We can use other normalizing schemes as long as they roughly keep each model’s margin in a similar magnitude to ease the optimization.

We define the model ensemble of a set of neural networks as , where , . is the coefficient for classifier . In our setting, each neural network classifier has been trained with certificates (Def. 3.2), and we want to further enhance them using carefully selected ensemble weights. Because they are linearly combined, we can give a linear upper and lower bound for , by linearly combining the upper and lower bounds given by Eq. (1). Again, for a training example , the lower bound on the margin of for the class inside is:

(9) |

Analogous to Eq. (4), these bounds are guaranteed for any , . are computed for model using Proposition 3.1 with a similar network transformation as in Eq. (2) and (3) for each . Note that each can have *completely different internal structure* (number of layers, number of neurons, architecture, etc) and be trained using different schemes, but their corresponding and have the same dimension.
For the ensemble, the normalized lower bounds of margin is the following:

(10) |

Main Idea: Why an ensemble can improve robustness? To enhance robustness via an ensemble, we want to find an optimal , such that is maximized for all training examples and target classes . A model lacking of robustness often has large ; by combining of different models with optimal weights, we hope that some noises in can be canceled out and has a smaller -norm than . When there are very limited number of models for selection, a naive ensemble (with all ) cannot guarantee to achieve this goal. Instead, RobBoost optimally select based on maximizing a surrogate loss on normalized lower bounds of margins.

In Figure 1, we plot matrix for a naturally trained and a robust model (adversarially trained using (madry2018towards)) to show the intuition behind RobBoost. Strikingly, has a quite interpretable pattern, especially on the adversarially trained model – is surprisingly sparse and the model output is only sensitive to changes on the pixels of the digit “1”; on the other hand, the naturally trained model has a lot of random noise around the “1” in the center, and is sensitive to many irrelevant background pixel changes. Our aim is thus to make the matrix less “noisy” through an careful ensemble: .

### 3.3 The RobBoost Loss Function

We first define the following RobBoost loss to maximize model’s lower bound of margin via ensemble, across all examples and classes:

(11) |

where is the vector of weights for each model , and is defined as in Eq. (10). This is a hinge-style surrogate loss of the 0-1 error defined in (7) that encourages large margin. We aim to decrease (7) by using an optimally weighted ensemble.

In this paper, we focus on the most common threat model where adversarial distortion () is applied (-RobBoost). In this case, Eq. (10) is a summation of absolute values:

(12) |

Here we explicitly write out the dependency of in and . When is fixed, we can define new matrices with reordered indices and absorb into it: . And similarly such that Eq. (12) can be rewritten as:

Note that for each example and each target class we have an and a . Then Eq. (11) becomes:

(13) |

Since the objective is non-smooth, piece-wise linear and low-dimensional, we propose to use coordinate descent to efficiently solve this minimization problem.

#### Solving -RobBoost using coordinate-descent.

In coordinate descent, we aim to solve the following one variable optimization problem with a randomly selected coordinate :

(14) |

where and are constants with respect to . While the box constraint can be easily handled by coordinate-descent, the challenge is the constraint , which needs to be directly enforced during the coordinate descent procedure. Suppose we have maintained before update, when updating variable , we enforce this constraint by scaling all other by a factor of such that . In other words, we redefine as:

(15) |

where , and , . Now the summation constraint has been removed, and for any we guarantee . Each term is a bounded one-dimensional piece-wise linear function within domain . The term contains linear terms inside absolute value so there are at most pieces. The minima of this term must be on the end of one piece, or at the boundary 0 or 1. Solving for in equations gives us the locations of the end points of these pieces in time, denoted as . We consider the worst case where all points lie in , and for convenience we denote , . Now the challenge remains to efficiently evaluate at these endpoint and two boundaries, each in time. We first sort in ascending order as , where is the permutation of sorting and we additionally define . Then, we start with , and check the sign of at for each . We define two sets indicating the sign of :

(16) | ||||

(17) |

They can be formed in time. Then we define the effective slope and intercept at the point as:

(18) | ||||

(19) |

and then we can evaluate . For , we evaluate one by one. Assuming we already obtained and () and evaluated . We can then recursively define and as:

(20) | ||||

(21) |

is an indicator function. We keep maintaining the slope and intercept for the next linear piece, when the sign of term just changed. This update only takes time. For , we reached the boundary 1 and evaluate .

For the final sum of surrogate losses, we merge sort all for all terms into a new vector with elements, also maintain a mapping which maps an element into the summation term it comes from. Our final algorithm evaluates the objective function on all and using the maintained effective slope and intercept for each term in summation, and update and using (20) and (21). Additionally, we need to consider up to additional linear pieces introduced by the function. We list the full algorithm in appendix in Algorithm 1. In each iteration of coordinate descent, we randomly choose a coordinate , and obtain the best value (14) to minimize the loss and set

. Then we choose another coordinate and repeat. We observe that 2 to 3 epochs (each epoch visits all coordinate once) are sufficient to find a good solution.

### 3.4 RobBoost in Gradient Boosting

Gradient Boosting builds a strong model by iteratively training and combining weak models:

where is kept unchanged and we train to reduce a certain loss function on . Unlike the setting we discussed in Section 3.3 where all base models are given and fixed, here we are allowed to update the model parameters of the last base model, with previous models frozen. Similar to Eq. (10), we can write the margin for example target class for the setting of gradient boosting:

(22) |

Suppose we have some surrogate loss function , we define the following loss function:

(23) |

Note that all for are precomputed and can be treated as constants, and are functions of the neural network parameters of model (due to Prop. 3.1, see wong2018provable; zhang2018crown for the explicit form). We can thus take the gradient , and use typical gradient-based optimization tools to update model and reduce the loss . Compared to the setting in the previous section where each base model is trained independently and then fixed, in (23), model knows the “weakness” of the ensemble of all previous models, and attempts to “fix” it, offering more flexibility.

## 4 Experiments

#### Overview and Setup.

We evaluate the effectiveness of RobBoost, by using it to find the best ensemble in a relatively large pool of models on MNIST and CIFAR-10 datasets. Since we focus on improving certified robustness, our main metric to evaluate model robustness is *verified error* on test set, as defined in (6); this is a provable upper bound of PGD attack error and has been used as the standard way to evaluate certified defense methods (wong2018provable; wong2018scaling). Since there is no existing work on boosting provable robustness, our baseline is the naive ensemble, where each model is equally weighted. Because our purpose is to show how optimally RobBoost weights each base model, we do not focus on tuning each base model to achieve state-of-the-art results on each dataset. We use small base models, where all MNIST models sum to 9.1 MB and all CIFAR models sum to 10.0 MB. We precompute for all training examples for each model. This precomputation takes similar time as 1 epoch of robust training (wong2018provable), since they need to compute the same bounds every epoch for training a single robust model, and they typically need 100 to 200 epochs for training. The time of our experiments is dominated by training each base models (hours to days each) rather than precomputing these matrices and solving the ensemble objective (1-2 hours).

#### Data Elimination.

We first remove all data points that have positive margins on *all* base models, as they will remain robust regardless of any positive weights. Similarly, we remove all data points that have negative margins on *all* base models, as the ensemble is not capable to improve robustness for them. These pre-processing steps allow us to focus on the data points whose robustness can be potentially enhanced, and also reduce the effective training data size. In Table 1, we report the percentage of examples eliminated in this preprocessing step. For all models, a large portion of examples (ranging from 70% to 97%) can be excluded from the optimization step, greatly improving the efficiency of RobBoost.

Dataset | Eliminated because | Eliminated because NOT | |

robust to all models | robust to all models | ||

MNIST | 0.1 | 97.67% | 0.07% |

0.2 | 91.85% | 0.42% | |

0.3 | 69.69% | 2.84% | |

CIFAR | 2/255 | 65.60% | 3.97% |

8/255 | 56.23% | 14.11% |

Model | Ensemble | Verified Error | Clean Error | PGD Error | Ensemble | ||

Train | Test | Test | Test | Model Size | |||

MNIST | 0.1 | Naive | 3.55% | 4.56% | 0.39% | 1.56% | 9.1M |

RobBoost | 2.65% | 4.47% | 0.39% | 1.17% | 5.3M | ||

0.2 | Naive | 13.82% | 13.66% | 2.34% | 3.91% | 9.1M | |

RobBoost | 12.41% | 12.58% | 1.95% | 3.12% | 7.4M | ||

0.3 | Naive | 39.22% | 38.60% | 11.33% | 19.92% | 9.1M | |

RobBoost | 37.42% | 36.61% | 9.77% | 19.53% | 5.7M | ||

CIFAR | 2/255 | Naive | 50.02% | 51.68% | 38.28% | 45.31% | 10.0M |

RobBoost | 47.27% | 49.51% | 34.77% | 41.80% | 6.1M | ||

8/255 | Naive | 74.95% | 74.50% | 62.89% | 71.09% | 10.0M | |

RobBoost | 74.56% | 74.27% | 60.94% | 69.14% | 5.6M |

#### Ensemble of robustly trained models with different architectures.

We train 12 MNIST models and 11 CIFAR models with a variety of architectures. The models are trained using convex adversarial polytope (wong2018scaling), a certified defense method, under different norm perturbations ( for MNIST and for CIFAR). Even the largest CIFAR model used is much smaller than the best ResNet model in wong2018scaling (40 MB), as we desire to use an ensemble of small models to obtain better robustness than a single large model. For CIFAR, the smallest model is around 0.1 MB and the largest is 3 MB; see more details on model structure in Appendix B. We list results in Table 2. We can observe that RobBoost consistently outperforms naive averaging ensemble in both verified and clean error. Also the ensemble model we created performs better in all metrics than the single large model reported in literature (wong2018scaling).

In Figure 3, we plot the distributions of lower bounds of margins for two models. A value less than 0 indicates that an example cannot be certified. Compared to the naive ensemble, we can clearly observe that the distributions of margins for RobBoost ensembles have more mass on the positive side, reflecting the improvements on certified robustness.

#### Ensemble of the same model architecture with feature subsampled data.

A common practice in building traditional ensemble models like random forest is to use feature subsampling, i.e., each base model only uses a subset of the features to train. In this experiment, we use only 1 model structure but randomly sample 80% pixels to train the model (a recent work

(hosseini2019dropping) presented a similar idea, but their method is not a certified defense). We train 5 feature sub-sampled models for MNIST with and CIFAR with . In Table 3, we list the verified error and clean error of each model for both the naive ensemble and RobBoost ensemble. Since there are only 5 base models, RobBoost ensembles provide a small but consistent performance advantage in verified error.Dataset | Test Error | Base Models No. | Naive | RobBoost | |||||

1 | 2 | 3 | 4 | 5 | |||||

MNIST | 0.3 | Clean | 12.96% | 13.99% | 23.88% | 12.77% | 18.50% | 13.34% | 12.51% |

Verified | 42.98% | 45.3% | 51.62% | 45.13% | 49.32% | 43.94% | 42.84% | ||

CIFAR | 2/255 | Clean | 42.33% | 42.15% | 41.77% | 42.43% | 41.86% | 40.79% | 40.77% |

Verified | 55.00% | 55.11% | 54.97% | 54.73% | 55.07% | 53.61% | 53.61% |

#### Gradient Boosting of Robust Ensemble.

Unlike previous experiments where all base models are given and fixed, we follow Eq. (23) and train models incrementally. We use cross-entropy loss and train an ensemble of 5 models on MNIST with and CIFAR with in Figure 2. We observe that usually the first 2 or 3 models decrease verified errors most (on MNIST from 39.8% to 34.6%, and on CIFAR from 51.18% to 49.55%. It is challenging to further decrease this error with more models, but gradient boosting allows us to fix the data points lacking of robustness on previous models, achieving better performance than Table 2.

## 5 Conclusion

We propose the the first ensemble algorithm, RobBoost, to enhance provable model robustness by optimally weighting each base model. Our algorithm involves optimizing a surrogate of the lower bound of classification margin through the proposed coordinate descent algorithm, and consistently outperforms a naive averaging ensemble as well as a state-of-the-art single model certified defense in verified error and clean accuracy.

## References

## References

## Appendix A The Coordinate Descent Algorithm for RobBoost

Here we present our full algorithm to optimally update one coordinate in coordinate descent. First, we rewrite our objective function:

(24) | ||||

(25) |

There are terms in summation (25), and we need to maintain “effective slope” and “effective intercept” for each term using the technique presented in the main text. Thus most variables have superscript . Note that to deal with the function in hinge loss, for each term we need to dynamically maintain the possible zero crossing point, which is stored in (line 9). If we reached a zero-crossing point before any other , we need to evaluate the function value at this zero-crossing point first (line 19). Additionally, once the effective slope and effective intercept change due to a sign change in absolute value terms, we need to update the zero-crossing point (line 25). In each iteration of coordinate descent, we choose a coordinate , obtain (14) using Algorithm 1 and set . Initially, we can set , then we can run several epochs of coordinate descent; in each epoch we optimize over all once, either in a cyclic order or a random order. Although coordinate descent cannot always find the global optimal solution, we found that it usually reduces the loss function sufficiently and rapidly; 2 or 3 epochs are sufficient for all our experiments.

## Appendix B Model Details

We implement all our models using PyTorch. Using model structures detailed in Table

4, we obtain 12 MNIST models and 11 CIFAR-10 models (without the last largest ResNet due to out of memory), robustly trained using the method proposed in wong2018scaling. For MNIST and CIFAR-10 we use the same model structure; only the input image shape differs.For experiments on ensemble of robustly trained models, we use all available models. For experiments on feature subsample, we use model K for both MNIST and CIFAR-10 datasets. For experiments on the ensemble of graident boosted models, we use model A,C,E,G,K for MNIST and B,G,I,J,K for CIFAR-10 dataset.