1 Introduction
Deep neural networks (NNs) achieve stateoftheart performance on a growing number of applications such as acoustic modelling, image classification, and fake news detection
(hinton2012deep; he; monti2019fake) to name but a few. Alongside their growing application, there is a literature on the robustness of deep nets which shows that it is often possible to generate images with subtle perturbations, referred to as adversarial examples (szegedy2013; goodfellow2014), to the input of a network resulting in its performance being severely degraded; for example, see (dalvi; kurakin; sitawarin; eykholt2017robust; yuan) concerning the usecase of self driving cars.Methods to generate these adversarial examples are classified according to two main criteria
(yuan). Adversarial Specificity

establishes what the aim of the adversary is. In nontargeted attacks, the method perturbs the image in such a way that it is misclassified into any different category than the original one. While in targeted settings, the adversary specifies a category into which an image has to be misclassified.
 Adversary’s Knowledge

defines the amount of information available to the adversary. In Whitebox settings the adversary has complete knowledge of the network architecture and weights, while in the Blackbox
setting the adversary is only able to obtain the preclassification outpupt vector for a limited number of inputs. The Whitebox setting allows for the use of gradients of a missclassification objective to efficiently compute the adversarial example
(goodfellow2014; carlini; Chen2018ead), while the same optimization formulation of the Blackbox setting requires use of a derivative free approach (narodytska2017; chen; ilyas2018black; Alzantot).
The generation of blackbox targeted adversarial examples for deep NNs has been extensively studied in a setting initially proposed by (chen) where:

the adversarial example is found by solving an optimisation problem designed to change the original classification of a specific input to a specific alternative.

the perturbation, which causes the network to change the classification, has entries bounded in magnitude by a specified infinity norm (maximum entry magnitude).

the number of queries to the NN needed to generate the adversarial example should be as small as possible.
The ZerothOrderOptimization (ZOO) (chen) introduced DFO methods for computing adversarial examples in the blackbox setting, specifically using a coordinate descent optimization algorithm. At the time this was a substantial departure from methods for the blackbox setting which train a proxy NN and then employ gradient based methods for whitebox attacks on the proxy network (papernot; tu2018); such methods are especially effective when numerous adversarial examples will be computed, but require substantially more network queries than the methods designed for misclassifying individual examples. Following the introduction of ZOO, there have been numerous improvements using other modelfree DFO based approaches, see (Alzantot; COMBI; andriushchenko2019square). Specifically, GenAttack (Alzantot)
is a genetic algorithm, COMBI
(COMBI) is a directsearch method that explores the vertices of the perturbation energy, and SQUARE (andriushchenko2019square) is a randomized directsearch method.In this manuscript we consider an alternative modelbased DFO method based on BOBYQA (powellbobyqa) which explicitly develops
models that approximate the loss function
in the optimisation problem and minimises the models using methods from continuous optimisation. By considering adversarial perturbations to three NNs trained on different datasets (MNIST, CIFAR10, and ImageNet), we show that for the modelfree methods
(Alzantot; COMBI; andriushchenko2019square) the number of evaluation of the NN grows more rapidly as the maximum perturbation energy decreases than does the method built upon BOBYQA. As a consequence GenAttack, COMBI and SQUARE are preferable for large values of the maximum perturbation energy and BOBYQA for smaller values. As an example Figure 1 illustrates how the BOBYQA based algorithm compares to GenAttack, COMBI, and SQUARE when considering a net either normally or adversarially trained on CIFAR10 with different maximum perturbation energies.We observe the intuitive principle that directsearch methods are effective to misclassify NNs with high perturbation energies, while in more challenging settings it is preferable to use more sophisticated modelbased methods, like ours. Modelbased approaches will further challenge defences to adversarial missclassification (dhillon2018stochastic; ijcai2019833), and in so doing will lead to improved defences and more robust networks. Modelbased DFO is a well developed area, and we expect further improvements are possible through a more extensive investigation of these approaches.
2 Adversarial Examples Formulated as an Optimisation Problem
Consider a classification operator from input space to output space of classes. A targeted adversarial perturbation to an input has the property that it changes the classification to a specified target class , i.e and . Herein we follow the formulation by (Alzantot). Given: an image X, a maximum energy budget , and a suitable loss function , then the task of computing the adversarial perturbation can be cast as an optimisation problem such as
(1)  
where the final two inequality constraints are due to the input entries being restricted to . Denoting the preclassification output vector by , i.e. , then the misclassification of X to target label is achieved by if . In (carlini; chen; Alzantot) they determined
(2) 
to be the most effective loss function for computing in (1), and we also employ this choice throughout our experiments.
3 Derivative Free Optimisation for Adversarial Examples
Derivative Free Optimisation is a well developed field with numerous types of algorithms, see (conn2009introduction) and (larson_menickelly_wild_2019) for reviews on DFO principles and algorithms. Examples of classes of such methods include: direct search methods such as simplex, modelbased methods, hybrid methods such as finite differences or implicit filtering, as well as randomized variants of the aforementioned and methods specific to convex or noisy objectives. The optimization formulation in Section 2 is amenable to virtually all DFO methods, making it unclear which of the algorithms to employ. Methods which have been trialled include: the finite difference based ZOO attack (chen), a combinatorial direct search of the perturbation energy constraint method COMBI (COMBI), a genetic direct search method GenAttack (Alzantot), and most recently a randomized directsearch method (andriushchenko2019square). Notably missing from the aforementioned list are modelbased methods.
Given a set of samples with , modelbased DFO methods start by identifying the minimiser of the objective among the samples at iteration , . Following this, a model for the objective function is constructed, typically centred around the minimizer. In its simplest form one uses a polynomial approximation to the objective, such as a quadratic model centred in
(3) 
with , c, , and being also symmetric. In a whitebox setting one would set and , but this is not feasible in the blackbox setting as we do not have access to the derivatives of the objective function. Thus c and M
are usually defined by imposing interpolation conditions
(4) 
and when (i.e. the system of equations is underdetermined) other conditions are introduced according to which algorithm is considered. The objective model (3
) is considered to be a good estimate of the objective in a neighbourhood referred to as a trust region. Once the model
is generated, the update step p is computed by solving the trust region problem(5) 
where is the radius of the region where we believe the model to be accurate, for more details see (nocedal). The new point is added to and a prior point is potentially removed. Herein we consider an exemplary^{1}^{1}1BOBYQA was selected among the numerous types of modelbased DFO algorithms due to its efficiency observed for other similar problems requiring few model samples as in climate modelling (Climate) modelbased method called BOBYQA.
Bobyqa
The BOBYQA algorithm, introduced in (powellbobyqa), updates the parameters of the model and M, in each iteration in such a way as to minimise the change in the quadratic term between iterates while otherwise fitting the sample values:
(6)  
(7) 
with and
initialised as the zero matrix. When the number of parameters
then the model is considered as linear with set as zero. We further allow only queries at each implementation of BOBYQA, since after the model is generated few iterations are needed to find the minimum.3.1 Computational Scalability and Efficiency
For improved computational scalability and efficiency, we do not solve (1) for directly, but instead use domain subsampling and hierarchical liftings: domain subsampling iteratively sweeps over batches of variables, see (8), while hierarchical liftings clusters and perturbs variables simultaneously, see (12).
Domain SubSampling
The simplest version of domain subsampling consists of partitioning input dimension into smaller disjoint domains; for example, domains of size which are disjoint and which cover all of . Rather than solving (1) for directly, for each of one sequentially solves for which are only nonzero for entries in . The resulting subdomain perturbations are then summed to generate the full perturbation , see Figure 2 as an example. That is, the optimisation problem (1) is adapted to repeatedly looping over :
(8)  
where the may be reinitialised; in particular following each loop over which occurs at .
We considered three possible ways of selecting the domains

In Random Sampling we consider at each iteration a different random subsamplings of the domain, .

In Ordered Sampling we generate a random disjoint partitioning of the domain. Once each variable has been optimised over once a new partitioning is generated.

In Variance Sampling we choose
to select in decreasing order of local variance of
, the variance in intensity among the 8 neighbouring variables (e.g. pixels) in the same colour channel. We further reinitialise after each loop through .
In Figure 3 we compare how these different subsampling techniques perform when generating adversarial example for the MNIST and CIFAR10 dataset. It can be observed that variance sampling consistently performs better than random and ordered sampling. This suggest that pixels belonging to highcontrast regions are more influential than the ones in a lowcontrast one, and hence variance sampling is the preferable ordering.
Hierarchical Lifting
When the domain is very high dimensional, working on single pixels is not efficient as the above described method would imply modifying only a very small proportion of the image; for instance, we will choose even when is almost threehundredthousand. Thus to perturb wider portions of the image, we consider a hierarchy of liftings as in the ZOO attack presented in (chen). We seek an adversarial example by optimising over increasingly higher dimensional spaces at each step referred here as level lifted to the image space. As an illustration, Figure 4 shows that hierarchical lifting has a significant impact on the minimisation of the loss function.
At each level we consider a linear lifting and find a level perturbation which is added to the full perturbation , according to
(9) 
where is initialised as and the level perturbations of the previous layers are considered as fixed. Moreover, we impose that at each level, the grid has to double in refinement, i.e. . An example of how this works is illustrated in Figure 5.
When generating our adversarial examples, we considered two kind of liftings. The first kind of liftings is based on interpolation operations; a sorting matrix is applied such that every index of is uniquely associated to a node of a coarse grid masked over the original image. Afterwards, an interpolation is implemented over the values in the coarse grid, i.e. . The second kind of liftings, instead, forces the perturbation to be highfrequency since there is several literature on these perturbations being the most effective (guo2018; gopalakrishnan2018toward; sharma2019effectiveness). Some preliminary results lead us to consider the “Block” lifting which considers a piecewise constant interpolation and corresponds to the one also used in (COMBI). Alternative piecewise linear or randomised orderings were also tried, but found not to be appreciably better to justify the added complexity. As we show for the example in Figure 6, this interpolation lifting divides an image in disjoint blocks via a coarse grid and associates to each of the blocks the same value of a parameter in . We characterise the lifting with the following conditions
(10)  
(11) 
Since may still be very high (usually ), for each level we apply domain subsampling and consider . We order the blocks according to the variance of mean intensity among neighbouring blocks, in contrast to the variance within each block which was suggested in (chen). Consequently, at each level the adversarial example is found by solving the following iterative problem
(12)  
where .
In its simplest formulation, hierarchical lifting struggles with the pixelwise interval constraint, . To address this we allow the entries in to exceed the interval and then reproject the pixelwise entries into the interval.
3.2 Algorithm pseudocode
Our BOBYQA based algorithm is summarised in Algorithm 1; note that not using the hierarchical method corresponds to having one level with . A Python implementation of the proposed algorithm based on BOBYQA package from (cartis) is available on Github^{2}^{2}2https://github.com/giughi/AModelBased
DerivativeFreeApproachtoBlackBox
AdversarialExamplesBOBYQA.
4 Comparison of Derivative Free Methods
We compare the performance of our BOBYQA based algorithm to GenAttack (Alzantot), combinatorial attacks COMBI (COMBI) and SQUARE (andriushchenko2019square). The performance is measured by considering the distribution of queries needed to successfully find adversaries to different networks trained on three standard datasets: MNIST (lecunmnisthandwrittendigit2010), CIFAR10 (CIFAR), and ImageNet (imagenet_cvpr09).
4.1 Parameter Setup for Algorithms
Our experiments rely for GenAttack (Alzantot), COMBI (COMBI), and SQUARE (andriushchenko2019square) on publicly available implementations^{3}^{3}3GenAttack: https://github.com/nesl/adversarial_genattack
COMBI: https://github.com/snumllab/parsimoniousblackboxattack
SQUARE: https://github.com/maxandr/squareattack
with same hyperparameter setting and hierarchical approach as suggested by the respective authors.
For the proposed algorithm based on BOBYQA, we tuned three main parameters: the dimension of the initial set , the batch dimension , and the trust region radius.

Batch Dimension Figure 7 shows the loss value averaged over 20 images for attacks to NNs trained on CIFAR10, and ImageNet datsets when different batch dimensions are chosen. The average objective loss as a function of network queries is largely insensitive to the batch sizes, but with modest differences for the larger ImageNet data set where was observed to require modestly fewer queries. For the remained of the simulations we use as a good tradeoff between faster model generation and good performances.

Initial Set Dimension Once a subdomain of dimension is chosen, the model (3) is initialised with a set of samples on which the interpolation conditions (4) are imposed. There are two main choices for the dimension of the set: either , thus computing and c with the interpolation and leaving M always null and thus having a linear model, or which allows us to initialise , and the diagonal of M, hence obtaining a quadratic model. The results in Figure 8 show that at each iteration of the domain subsampling the quadratic method performs as well as a linear method, however it requires more queries to initialise the model. Thus we consider the linear model with ^{4}^{4}4The Constraint Optimisation by Linear Approximation (COBYLA), a linear based model DFO algorithm, was introduced before BOBYQA (powell2007view); however, COBYLA considers different constraints on the norm of the variable. Because of this and the possibility to extend the method to quadratic models, we name our algorithm after BOBYQA..

Trust Region Radius Once the model for the optimisation is built, the step of the optimisation is bounded by the trust region radius. We have selected the beginning radius to be one third of the whole space in which the perturbation lies. With this choice of radius we usually reach within 5 steps a corner of the boundary, and the further iterates remain effectively stationary.
For the hierarchical lifting approach we consider an initial subdomain of dimension , as this is the biggest grid that we can optimise over with a batch . After considering , we make use of and do not consider further levels.
4.2 Dataset and Neural Network Specifications
Experiments on each dataset are performed with one of the best performing NN architectures as described below
Mnist/cifar10
MNIST and CIFAR10 are two datasets with images divided between 10 classes and of dimension 28x28x1 and 32x32x3 respectively. On them we apply the net introduced in (chen)
which is structured in succession by: 2 Conv layers with ReLu activation followed by a maxpooling layer. This process is repeated twice and then two dense layers with Relu activation are applied. Finally a softmax layer generates the output vector. For each dataset, we train the same architecture in two different ways obtaining separate nets. One is obtained by optimising the accuracy of the net on raw unperturbed images, while the other is trained with the application of the distillation defence by
(Papernot_distillation).To generate a comprehensive distribution for the queries at each energy budget, for both the two trained nets and 10 images per class, we attempt to misclassify an image targeting all of the 9 remaining classes; this way we generate a total of 900 perturbations per energy budget. For these two datasets the images are of relative low dimension and we do not apply the hierarchical approach.
ImageNet
This is a dataset of millions of images with a dimension of 299x299x3 divided between 1000 classes. For this dataset we consider the Inceptionv3 net (Inceptionv3) trained with and without the adversarial defence proposed in (kurakin2016adversarial)^{5}^{5}5For the nonadversarially trained net we considered the one available at http://jaina.cs.ucdavis.edu/datasets/adv/imagenet/inception_v3_2016_08_28_frozen.tar.gz, while for the weights of the adversarially trained net we relied on https://github.com/tensorflow/models/tree/master/research/adv_imagenet_models.. Due to the large number of target classes in ImageNet, we perform tests on random images and target classes. The number of tests conducted for Inceptionv3 (Inceptionv3) and the adversarially trained variant (kurakin2016adversarial) are: 303 and 120 for , 155 and 114 for and 149 and 116 for respectively.
4.3 Experimental Results
In Figure 9 we present the cumulative fraction of images misclassified (abridged by CDF for cumulative distribution function) as a function of the number of queries to the NN for different perturbation energies . The pixels are normalised to be in the interval , hence, would imply that any pixel is allowed to change of the total intensity range from its initial value. By illustrating the CDFs we easily see which method has been able to misclassify the largest fraction of images in the given testset for a fixed number of queries to the NN. It can be observed that the proposed BOBYQA based approach achieves stateoftheart results when the perturbation bound of decreases. This behaviour is consistent across all of the considered datasets (MNIST, CIFAR10, and ImageNet); however, the energy at which the BOBYQA algorithm performs the best, varies in each case.
In the experiments we also considered nets trained with defence methods, distillation (Papernot_distillation) for MNIST and CIFAR10 datasets while adversarial training (kurakin2016adversarial) for ImageNet, and the results can be identified in Figure 9 by the dashed lines. Similar to the previous case, we observe that the proposed BOBYQA based algorithm performs the best when the energy perturbation decreases. Moreover, the BOBYQA based algorithm seems to be the least affected in its performance when the any defence is used; for example, at 0.01 and 15,000 queries, the defence reduces the CDF of COMBI by 0.078 compared to 0.051 for BOBYQA. This further supports the idea that for more challenging scenarios modelbased approaches are preferable as compared to modelfree counterparts.
We associate the counterintuitive improvement of the CDF in the MNIST and ImageNet with high perturbation energies cases to the distillation and the adversarial training being focused primarily on low energy perturbations. For ImageNet, nonmodelbased algorithms use different hierarchical approaches which we expect leads in part to the superior performance of COMBI in Fig. 9 panels (g)(i).
5 Discussion and Conclusion
We have introduced BOBYQA, a method to search adversarial examples based on a modelbased DFO algorithm and have conducted some experiments to understand how it compares to existing GenAttack (Alzantot), COMBI (COMBI), and SQUARE (andriushchenko2019square) attack, when targeted blackbox adversarial examples are searched with the fewest queries to a neural net.
Following the results of the experiments that we presented above, the method with which generating the adversarial example should be chosen according to which setting the adversary is considering. When the perturbation energy is high, one should choose either COMBI if the input is highdimensional or SQUARE if the input is lowdimensional. On the other hand, a modelbased approach like BOBYQA should be considered as soon as the complexity of the setting increases, e.g. the maximum perturbation energy is reduced or the net is adversarially trained.
With the BOBYQA attack algorithm we have introduced a different approach for the generation of targeted adversarial examples in a blackbox setting with the aim of exploring what advantages are achieved by considering modelbased DFO algorithms. We did not focus on presenting an algorithm which is in absolute the most efficient; primarily because our algorithm has several aspects in which to be improved. The BOBYQA attack is limited by the implementation of pyBOBYQA (cartis) since the elementwise constraints do not allow the consideration of more sophisticated liftings which leverage on compressed sensing, to name one of the many possible variations.
In conclusion, the results in this paper support how sophisticated misclassification methods are preferable in challenging settings. As a consequence, variations on our modelbased algorithms should be considered in the future as a tool to establish the effectiveness of newly presented adversarial defence techniques.
Acknowledgements
This publication is based on work supported by the EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration with New Rock Capital Management.
Comments
There are no comments yet.