A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA

by   Giuseppe Ughi, et al.

We demonstrate that model-based derivative free optimisation algorithms can generate adversarial targeted misclassification of deep networks using fewer network queries than non-model-based methods. Specifically, we consider the black-box setting, and show that the number of networks queries is less impacted by making the task more challenging either through reducing the allowed ℓ^∞ perturbation energy or training the network with defences against adversarial misclassification. We illustrate this by contrasting the BOBYQA algorithm with the state-of-the-art model-free adversarial targeted misclassification approaches based on genetic, combinatorial, and direct-search algorithms. We observe that for high ℓ^∞ energy perturbations on networks, the aforementioned simpler model-free methods require the fewest queries. In contrast, the proposed BOBYQA based method achieves state-of-the-art results when the perturbation energy decreases, or if the network is trained against adversarial perturbations.



There are no comments yet.


page 5


An Empirical Study of Derivative-Free-Optimization Algorithms for Targeted Black-Box Attacks in Deep Neural Networks

We perform a comprehensive study on the performance of derivative free o...

Energy Attack: On Transferring Adversarial Examples

In this work we propose Energy Attack, a transfer-based black-box L_∞-ad...

GeoDA: a geometric framework for black-box adversarial attacks

Adversarial examples are known as carefully perturbed images fooling ima...

Targeted free energy estimation via learned mappings

Free energy perturbation (FEP) was proposed by Zwanzig more than six dec...

Targeted Adversarial Examples for Black Box Audio Systems

The application of deep recurrent networks to audio transcription has le...

Sign-OPT: A Query-Efficient Hard-label Adversarial Attack

We study the most practical problem setup for evaluating adversarial rob...

Progressive residual learning for single image dehazing

The recent physical model-free dehazing methods have achieved state-of-t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural networks (NNs) achieve state-of-the-art performance on a growing number of applications such as acoustic modelling, image classification, and fake news detection

(hinton2012deep; he; monti2019fake) to name but a few. Alongside their growing application, there is a literature on the robustness of deep nets which shows that it is often possible to generate images with subtle perturbations, referred to as adversarial examples (szegedy2013; goodfellow2014), to the input of a network resulting in its performance being severely degraded; for example, see (dalvi; kurakin; sitawarin; eykholt2017robust; yuan) concerning the use-case of self driving cars.

Methods to generate these adversarial examples are classified according to two main criteria


Adversarial Specificity

establishes what the aim of the adversary is. In non-targeted attacks, the method perturbs the image in such a way that it is misclassified into any different category than the original one. While in targeted settings, the adversary specifies a category into which an image has to be misclassified.

Adversary’s Knowledge

defines the amount of information available to the adversary. In White-box settings the adversary has complete knowledge of the network architecture and weights, while in the Black-box

setting the adversary is only able to obtain the pre-classification outpupt vector for a limited number of inputs. The White-box setting allows for the use of gradients of a missclassification objective to efficiently compute the adversarial example

(goodfellow2014; carlini; Chen2018ead), while the same optimization formulation of the Black-box setting requires use of a derivative free approach (narodytska2017; chen; ilyas2018black; Alzantot).

Figure 1: The success rate (SR) of the BOBYQA algorithm to generate a targeted adversarial example compared to GenAttack (Alzantot), COMBI (COMBI), and SQUARE (andriushchenko2019square) attacks as a function of the perturbation energy; specifically for a network trained on the CIFAR10 dataset without defences (Norm) and with the distillation defence (Adv) (Papernot_distillation). It can be observed that as decreases the BOBYQA based method achieves a higher SR than other methods. Similarly, the success rate of BOBYQA is less affected by adversarial training. In particular, with the infinity norm of the perturbation limited to BOBYQA achieves a SR 1.15 and 1.59 folds better than SQUARE when considering Norm and Adv respectively. Here the number of network queries were restricted to 3,000, for further details see Fig. 9.

The generation of black-box targeted adversarial examples for deep NNs has been extensively studied in a setting initially proposed by (chen) where:

  • the adversarial example is found by solving an optimisation problem designed to change the original classification of a specific input to a specific alternative.

  • the perturbation, which causes the network to change the classification, has entries bounded in magnitude by a specified infinity norm (maximum entry magnitude).

  • the number of queries to the NN needed to generate the adversarial example should be as small as possible.

The Zeroth-Order-Optimization (ZOO) (chen) introduced DFO methods for computing adversarial examples in the black-box setting, specifically using a coordinate descent optimization algorithm. At the time this was a substantial departure from methods for the black-box setting which train a proxy NN and then employ gradient based methods for white-box attacks on the proxy network (papernot; tu2018); such methods are especially effective when numerous adversarial examples will be computed, but require substantially more network queries than the methods designed for misclassifying individual examples. Following the introduction of ZOO, there have been numerous improvements using other model-free DFO based approaches, see (Alzantot; COMBI; andriushchenko2019square). Specifically, GenAttack (Alzantot)

is a genetic algorithm, COMBI

(COMBI) is a direct-search method that explores the vertices of the perturbation energy, and SQUARE (andriushchenko2019square) is a randomized direct-search method.

In this manuscript we consider an alternative model-based DFO method based on BOBYQA (powellbobyqa) which explicitly develops

models that approximate the loss function

in the optimisation problem and minimises the models using methods from continuous optimisation. By considering adversarial perturbations to three NNs trained on different datasets (MNIST, CIFAR10, and ImageNet), we show that for the model-free methods

(Alzantot; COMBI; andriushchenko2019square) the number of evaluation of the NN grows more rapidly as the maximum perturbation energy decreases than does the method built upon BOBYQA. As a consequence GenAttack, COMBI and SQUARE are preferable for large values of the maximum perturbation energy and BOBYQA for smaller values. As an example Figure 1 illustrates how the BOBYQA based algorithm compares to GenAttack, COMBI, and SQUARE when considering a net either normally or adversarially trained on CIFAR10 with different maximum perturbation energies.

We observe the intuitive principle that direct-search methods are effective to misclassify NNs with high perturbation energies, while in more challenging settings it is preferable to use more sophisticated model-based methods, like ours. Model-based approaches will further challenge defences to adversarial missclassification (dhillon2018stochastic; ijcai2019-833), and in so doing will lead to improved defences and more robust networks. Model-based DFO is a well developed area, and we expect further improvements are possible through a more extensive investigation of these approaches.

2 Adversarial Examples Formulated as an Optimisation Problem

Consider a classification operator from input space to output space of classes. A targeted adversarial perturbation to an input has the property that it changes the classification to a specified target class , i.e and . Herein we follow the formulation by (Alzantot). Given: an image X, a maximum energy budget , and a suitable loss function , then the task of computing the adversarial perturbation can be cast as an optimisation problem such as


where the final two inequality constraints are due to the input entries being restricted to . Denoting the pre-classification output vector by , i.e. , then the misclassification of X to target label is achieved by if . In (carlini; chen; Alzantot) they determined


to be the most effective loss function for computing in (1), and we also employ this choice throughout our experiments.

3 Derivative Free Optimisation for Adversarial Examples

Derivative Free Optimisation is a well developed field with numerous types of algorithms, see (conn2009introduction) and (larson_menickelly_wild_2019) for reviews on DFO principles and algorithms. Examples of classes of such methods include: direct search methods such as simplex, model-based methods, hybrid methods such as finite differences or implicit filtering, as well as randomized variants of the aforementioned and methods specific to convex or noisy objectives. The optimization formulation in Section 2 is amenable to virtually all DFO methods, making it unclear which of the algorithms to employ. Methods which have been trialled include: the finite difference based ZOO attack (chen), a combinatorial direct search of the perturbation energy constraint method COMBI (COMBI), a genetic direct search method GenAttack (Alzantot), and most recently a randomized direct-search method (andriushchenko2019square). Notably missing from the aforementioned list are model-based methods.

Given a set of samples with , model-based DFO methods start by identifying the minimiser of the objective among the samples at iteration , . Following this, a model for the objective function is constructed, typically centred around the minimizer. In its simplest form one uses a polynomial approximation to the objective, such as a quadratic model centred in


with , c, , and being also symmetric. In a white-box setting one would set and , but this is not feasible in the black-box setting as we do not have access to the derivatives of the objective function. Thus c and M

are usually defined by imposing interpolation conditions


and when (i.e. the system of equations is under-determined) other conditions are introduced according to which algorithm is considered. The objective model (3

) is considered to be a good estimate of the objective in a neighbourhood referred to as a trust region. Once the model

is generated, the update step p is computed by solving the trust region problem


where is the radius of the region where we believe the model to be accurate, for more details see (nocedal). The new point is added to and a prior point is potentially removed. Herein we consider an exemplary111BOBYQA was selected among the numerous types of model-based DFO algorithms due to its efficiency observed for other similar problems requiring few model samples as in climate modelling (Climate) model-based method called BOBYQA.


The BOBYQA algorithm, introduced in (powellbobyqa), updates the parameters of the model and M, in each iteration in such a way as to minimise the change in the quadratic term between iterates while otherwise fitting the sample values:


with and

initialised as the zero matrix. When the number of parameters

then the model is considered as linear with set as zero. We further allow only queries at each implementation of BOBYQA, since after the model is generated few iterations are needed to find the minimum.

3.1 Computational Scalability and Efficiency

For improved computational scalability and efficiency, we do not solve (1) for directly, but instead use domain sub-sampling and hierarchical liftings: domain sub-sampling iteratively sweeps over batches of variables, see (8), while hierarchical liftings clusters and perturbs variables simultaneously, see (12).

Domain Sub-Sampling

The simplest version of domain sub-sampling consists of partitioning input dimension into smaller disjoint domains; for example, domains of size which are disjoint and which cover all of . Rather than solving (1) for directly, for each of one sequentially solves for which are only non-zero for entries in . The resulting sub-domain perturbations are then summed to generate the full perturbation , see Figure 2 as an example. That is, the optimisation problem (1) is adapted to repeatedly looping over :


where the may be reinitialised; in particular following each loop over which occurs at .

Figure 2: Example of how the perturbation evolves through the iterations when an image in is attacked. In (a) the perturbation is and we select a sub-domain of pixels (in red). Once we have found the optimal perturbation in the selected sub-domain, we update the perturbation in (b) and select a new sub-domain of dimension . The same is repeated in (c).
Figure 3: Cumulative distribution function of successfully perturbed images as a function of number of queries to a NN trained on MNIST and CIFAR10 datasets. In each image the effectiveness of different sub-sampling methods in generating a successful adversarial example is shown for different values of perturbation energies . See Section 4.2 for details about experimental setup and NN architectures.

We considered three possible ways of selecting the domains

  • In Random Sampling we consider at each iteration a different random sub-samplings of the domain, .

  • In Ordered Sampling we generate a random disjoint partitioning of the domain. Once each variable has been optimised over once a new partitioning is generated.

  • In Variance Sampling we choose

    to select in decreasing order of local variance of

    , the variance in intensity among the 8 neighbouring variables (e.g. pixels) in the same colour channel. We further reinitialise after each loop through .

In Figure 3 we compare how these different sub-sampling techniques perform when generating adversarial example for the MNIST and CIFAR10 dataset. It can be observed that variance sampling consistently performs better than random and ordered sampling. This suggest that pixels belonging to high-contrast regions are more influential than the ones in a low-contrast one, and hence variance sampling is the preferable ordering.

Figure 4: Impact of hierarchical lifting approach on Loss function (2) as a function of the number of queries to Inception-v3 net trained on ImageNet dataset to find the adversarial example for a single image. The green vertical lines correspond to changes of hierarchical level, which entail an increase in the dimension of the optimisation space.

Hierarchical Lifting

When the domain is very high dimensional, working on single pixels is not efficient as the above described method would imply modifying only a very small proportion of the image; for instance, we will choose even when is almost three-hundred-thousand. Thus to perturb wider portions of the image, we consider a hierarchy of liftings as in the ZOO attack presented in (chen). We seek an adversarial example by optimising over increasingly higher dimensional spaces at each step referred here as level lifted to the image space. As an illustration, Figure 4 shows that hierarchical lifting has a significant impact on the minimisation of the loss function.

Figure 5: Example of how the perturbation is generated in a hierarchical lifting method with and on an image . In (a) the perturbation is and we highlight in red the boxes generated via the grid of dimension . Once we have found the optimal perturbation , we update the perturbation in (b) and further divide the image with a grid with blocks. Once an optimal solution is found for this grid, the final solution is shown in (c).

At each level we consider a linear lifting and find a level perturbation which is added to the full perturbation , according to


where is initialised as and the level perturbations of the previous layers are considered as fixed. Moreover, we impose that at each level, the grid has to double in refinement, i.e. . An example of how this works is illustrated in Figure 5.

When generating our adversarial examples, we considered two kind of liftings. The first kind of liftings is based on interpolation operations; a sorting matrix is applied such that every index of is uniquely associated to a node of a coarse grid masked over the original image. Afterwards, an interpolation is implemented over the values in the coarse grid, i.e. . The second kind of liftings, instead, forces the perturbation to be high-frequency since there is several literature on these perturbations being the most effective (guo2018; gopalakrishnan2018toward; sharma2019effectiveness). Some preliminary results lead us to consider the “Block” lifting which considers a piecewise constant interpolation and corresponds to the one also used in (COMBI). Alternative piecewise linear or randomised orderings were also tried, but found not to be appreciably better to justify the added complexity. As we show for the example in Figure 6, this interpolation lifting divides an image in disjoint blocks via a coarse grid and associates to each of the blocks the same value of a parameter in . We characterise the lifting with the following conditions

Figure 6: In the “Block” lifting the perturbation is first applier to a sorting matrix S to which an interpolation L is implemented. Thus each block is associated uniquely to one of the variables in .

Since may still be very high (usually ), for each level we apply domain sub-sampling and consider . We order the blocks according to the variance of mean intensity among neighbouring blocks, in contrast to the variance within each block which was suggested in (chen). Consequently, at each level the adversarial example is found by solving the following iterative problem


where .

In its simplest formulation, hierarchical lifting struggles with the pixel-wise interval constraint, . To address this we allow the entries in to exceed the interval and then reproject the pixel-wise entries into the interval.

1:  Input: Image , target label , maximum perturbation , Neural Net , initial hierarchical level dimensions , maximum number of evaluations , batch sampling size , and maximum number of queries that we are allowed to do for each batch.
2:  Initialise , , .
3:  while  and  do
4:     Compute the number of sub samplings necessary to cover the whole domain
5:     Generate the lifting matrix
6:     for  do
7:        Compute the matrix which selects dimensions of the -dimensional domain.
8:        Define the bounds for a perturbation over the selected pixels of .
9:        Find by implementing the BOBYQA optimisation to the problem (12).
10:        Update the noise .
11:         += , , .
12:     end for
13:  end while
14:  if  then
15:     The perturbation is successful.
16:  else if  then
17:     The perturbation was not successful with iterations.
18:  end if
Algorithm 1 BOBYQA Based Algorithm

3.2 Algorithm pseudo-code

Our BOBYQA based algorithm is summarised in Algorithm 1; note that not using the hierarchical method corresponds to having one level with . A Python implementation of the proposed algorithm based on BOBYQA package from (cartis) is available on Github222https://github.com/giughi/A-Model-Based-

4 Comparison of Derivative Free Methods

We compare the performance of our BOBYQA based algorithm to GenAttack (Alzantot), combinatorial attacks COMBI (COMBI) and SQUARE (andriushchenko2019square). The performance is measured by considering the distribution of queries needed to successfully find adversaries to different networks trained on three standard datasets: MNIST (lecun-mnisthandwrittendigit-2010), CIFAR10 (CIFAR), and ImageNet (imagenet_cvpr09).

4.1 Parameter Setup for Algorithms

Our experiments rely for GenAttack (Alzantot), COMBI (COMBI), and SQUARE (andriushchenko2019square) on publicly available implementations333GenAttack: https://github.com/nesl/adversarial_genattack
COMBI: https://github.com/snu-mllab/parsimonious-blackbox-attack
SQUARE: https://github.com/max-andr/square-attack

with same hyperparameter setting and hierarchical approach as suggested by the respective authors.

For the proposed algorithm based on BOBYQA, we tuned three main parameters: the dimension of the initial set , the batch dimension , and the trust region radius.

(a) CIFAR10
(b) ImageNet
Figure 7: Comparison in loss function according to the different batch dimensions and the different dataset. After the linear model is generated, the optimisation algorithm is always allowed to query the net 5 times if or , or 10 times if . For ImageNet we are using the hierarchical lifting approach.
Figure 8: Comparison on how the loss decreases when the initial set dimension is either or in an attack to an image of MNIST with . We chose for both the methods and a maximum of function evaluations after the model was initialised, i.e. .

Batch Dimension Figure 7 shows the loss value averaged over 20 images for attacks to NNs trained on CIFAR10, and ImageNet datsets when different batch dimensions are chosen. The average objective loss as a function of network queries is largely insensitive to the batch sizes, but with modest differences for the larger ImageNet data set where was observed to require modestly fewer queries. For the remained of the simulations we use as a good trade-off between faster model generation and good performances.

Initial Set Dimension Once a subdomain of dimension is chosen, the model (3) is initialised with a set of samples on which the interpolation conditions (4) are imposed. There are two main choices for the dimension of the set: either , thus computing and c with the interpolation and leaving M always null and thus having a linear model, or which allows us to initialise , and the diagonal of M, hence obtaining a quadratic model. The results in Figure 8 show that at each iteration of the domain sub-sampling the quadratic method performs as well as a linear method, however it requires more queries to initialise the model. Thus we consider the linear model with 444The Constraint Optimisation by Linear Approximation (COBYLA), a linear based model DFO algorithm, was introduced before BOBYQA (powell2007view); however, COBYLA considers different constraints on the norm of the variable. Because of this and the possibility to extend the method to quadratic models, we name our algorithm after BOBYQA..

Trust Region Radius Once the model for the optimisation is built, the step of the optimisation is bounded by the trust region radius. We have selected the beginning radius to be one third of the whole space in which the perturbation lies. With this choice of radius we usually reach within 5 steps a corner of the boundary, and the further iterates remain effectively stationary.

For the hierarchical lifting approach we consider an initial sub-domain of dimension , as this is the biggest grid that we can optimise over with a batch . After considering , we make use of and do not consider further levels.

4.2 Dataset and Neural Network Specifications

Experiments on each dataset are performed with one of the best performing NN architectures as described below

Figure 9: Cumulative fraction of test set images successfully misclassified with adversarial examples generated by GenAttack, COMBI, SQUARE and our BOBYQA based approach for different perturbation energies and NNs trained on MNIST, CIFAR10 and ImageNet dataset. In all results the solid and dashed lines denoted by ‘Norm’ and ‘Adv’ corresponds to attacks on nets trained without or with a defence strategy respectively. For MNIST and CIFAR we consider the distillation defence method from (Papernot_distillation) while for ImageNet the adversarial training proposed in (kurakin2016adversarial).


MNIST and CIFAR10 are two data-sets with images divided between 10 classes and of dimension 28x28x1 and 32x32x3 respectively. On them we apply the net introduced in (chen)

which is structured in succession by: 2 Conv layers with ReLu activation followed by a maxpooling layer. This process is repeated twice and then two dense layers with Relu activation are applied. Finally a softmax layer generates the output vector. For each dataset, we train the same architecture in two different ways obtaining separate nets. One is obtained by optimising the accuracy of the net on raw unperturbed images, while the other is trained with the application of the distillation defence by


To generate a comprehensive distribution for the queries at each energy budget, for both the two trained nets and 10 images per class, we attempt to misclassify an image targeting all of the 9 remaining classes; this way we generate a total of 900 perturbations per energy budget. For these two datasets the images are of relative low dimension and we do not apply the hierarchical approach.


This is a data-set of millions of images with a dimension of 299x299x3 divided between 1000 classes. For this data-set we consider the Inception-v3 net (Inceptionv3) trained with and without the adversarial defence proposed in (kurakin2016adversarial)555For the non-adversarially trained net we considered the one available at http://jaina.cs.ucdavis.edu/datasets/adv/imagenet/inception_v3_2016_08_28_frozen.tar.gz, while for the weights of the adversarially trained net we relied on https://github.com/tensorflow/models/tree/master/research/adv_imagenet_models.. Due to the large number of target classes in ImageNet, we perform tests on random images and target classes. The number of tests conducted for Inception-v3 (Inceptionv3) and the adversarially trained variant (kurakin2016adversarial) are: 303 and 120 for , 155 and 114 for and 149 and 116 for respectively.

4.3 Experimental Results

In Figure 9 we present the cumulative fraction of images misclassified (abridged by CDF for cumulative distribution function) as a function of the number of queries to the NN for different perturbation energies . The pixels are normalised to be in the interval , hence, would imply that any pixel is allowed to change of the total intensity range from its initial value. By illustrating the CDFs we easily see which method has been able to misclassify the largest fraction of images in the given test-set for a fixed number of queries to the NN. It can be observed that the proposed BOBYQA based approach achieves state-of-the-art results when the perturbation bound of decreases. This behaviour is consistent across all of the considered datasets (MNIST, CIFAR10, and ImageNet); however, the energy at which the BOBYQA algorithm performs the best, varies in each case.

In the experiments we also considered nets trained with defence methods, distillation (Papernot_distillation) for MNIST and CIFAR10 datasets while adversarial training (kurakin2016adversarial) for ImageNet, and the results can be identified in Figure 9 by the dashed lines. Similar to the previous case, we observe that the proposed BOBYQA based algorithm performs the best when the energy perturbation decreases. Moreover, the BOBYQA based algorithm seems to be the least affected in its performance when the any defence is used; for example, at 0.01 and 15,000 queries, the defence reduces the CDF of COMBI by 0.078 compared to 0.051 for BOBYQA. This further supports the idea that for more challenging scenarios model-based approaches are preferable as compared to model-free counterparts.

We associate the counter-intuitive improvement of the CDF in the MNIST and ImageNet with high perturbation energies cases to the distillation and the adversarial training being focused primarily on low energy perturbations. For ImageNet, non-model-based algorithms use different hierarchical approaches which we expect leads in part to the superior performance of COMBI in Fig. 9 panels (g)-(i).

5 Discussion and Conclusion

We have introduced BOBYQA, a method to search adversarial examples based on a model-based DFO algorithm and have conducted some experiments to understand how it compares to existing GenAttack (Alzantot), COMBI (COMBI), and SQUARE (andriushchenko2019square) attack, when targeted black-box adversarial examples are searched with the fewest queries to a neural net.

Following the results of the experiments that we presented above, the method with which generating the adversarial example should be chosen according to which setting the adversary is considering. When the perturbation energy is high, one should choose either COMBI if the input is high-dimensional or SQUARE if the input is low-dimensional. On the other hand, a model-based approach like BOBYQA should be considered as soon as the complexity of the setting increases, e.g. the maximum perturbation energy is reduced or the net is adversarially trained.

With the BOBYQA attack algorithm we have introduced a different approach for the generation of targeted adversarial examples in a black-box setting with the aim of exploring what advantages are achieved by considering model-based DFO algorithms. We did not focus on presenting an algorithm which is in absolute the most efficient; primarily because our algorithm has several aspects in which to be improved. The BOBYQA attack is limited by the implementation of py-BOBYQA (cartis) since the element-wise constraints do not allow the consideration of more sophisticated liftings which leverage on compressed sensing, to name one of the many possible variations.

In conclusion, the results in this paper support how sophisticated misclassification methods are preferable in challenging settings. As a consequence, variations on our model-based algorithms should be considered in the future as a tool to establish the effectiveness of newly presented adversarial defence techniques.


This publication is based on work supported by the EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration with New Rock Capital Management.