Deep neural networks (NNs) achieve state-of-the-art performance on a growing number of applications such as acoustic modelling, image classification, and fake news detection(hinton2012deep; he; monti2019fake) to name but a few. Alongside their growing application, there is a literature on the robustness of deep nets which shows that it is often possible to generate images with subtle perturbations, referred to as adversarial examples (szegedy2013; goodfellow2014), to the input of a network resulting in its performance being severely degraded; for example, see (dalvi; kurakin; sitawarin; eykholt2017robust; yuan) concerning the use-case of self driving cars.
Methods to generate these adversarial examples are classified according to two main criteria(yuan).
- Adversarial Specificity
establishes what the aim of the adversary is. In non-targeted attacks, the method perturbs the image in such a way that it is misclassified into any different category than the original one. While in targeted settings, the adversary specifies a category into which an image has to be misclassified.
- Adversary’s Knowledge
defines the amount of information available to the adversary. In White-box settings the adversary has complete knowledge of the network architecture and weights, while in the Black-box
setting the adversary is only able to obtain the pre-classification outpupt vector for a limited number of inputs. The White-box setting allows for the use of gradients of a missclassification objective to efficiently compute the adversarial example(goodfellow2014; carlini; Chen2018ead), while the same optimization formulation of the Black-box setting requires use of a derivative free approach (narodytska2017; chen; ilyas2018black; Alzantot).
The generation of black-box targeted adversarial examples for deep NNs has been extensively studied in a setting initially proposed by (chen) where:
the adversarial example is found by solving an optimisation problem designed to change the original classification of a specific input to a specific alternative.
the perturbation, which causes the network to change the classification, has entries bounded in magnitude by a specified infinity norm (maximum entry magnitude).
the number of queries to the NN needed to generate the adversarial example should be as small as possible.
The Zeroth-Order-Optimization (ZOO) (chen) introduced DFO methods for computing adversarial examples in the black-box setting, specifically using a coordinate descent optimization algorithm. At the time this was a substantial departure from methods for the black-box setting which train a proxy NN and then employ gradient based methods for white-box attacks on the proxy network (papernot; tu2018); such methods are especially effective when numerous adversarial examples will be computed, but require substantially more network queries than the methods designed for misclassifying individual examples. Following the introduction of ZOO, there have been numerous improvements using other model-free DFO based approaches, see (Alzantot; COMBI; andriushchenko2019square). Specifically, GenAttack (Alzantot)
is a genetic algorithm, COMBI(COMBI) is a direct-search method that explores the vertices of the perturbation energy, and SQUARE (andriushchenko2019square) is a randomized direct-search method.
In this manuscript we consider an alternative model-based DFO method based on BOBYQA (powellbobyqa) which explicitly develops models that approximate the loss function
models that approximate the loss function
in the optimisation problem and minimises the models using methods from continuous optimisation. By considering adversarial perturbations to three NNs trained on different datasets (MNIST, CIFAR10, and ImageNet), we show that for the model-free methods(Alzantot; COMBI; andriushchenko2019square) the number of evaluation of the NN grows more rapidly as the maximum perturbation energy decreases than does the method built upon BOBYQA. As a consequence GenAttack, COMBI and SQUARE are preferable for large values of the maximum perturbation energy and BOBYQA for smaller values. As an example Figure 1 illustrates how the BOBYQA based algorithm compares to GenAttack, COMBI, and SQUARE when considering a net either normally or adversarially trained on CIFAR10 with different maximum perturbation energies.
We observe the intuitive principle that direct-search methods are effective to misclassify NNs with high perturbation energies, while in more challenging settings it is preferable to use more sophisticated model-based methods, like ours. Model-based approaches will further challenge defences to adversarial missclassification (dhillon2018stochastic; ijcai2019-833), and in so doing will lead to improved defences and more robust networks. Model-based DFO is a well developed area, and we expect further improvements are possible through a more extensive investigation of these approaches.
2 Adversarial Examples Formulated as an Optimisation Problem
Consider a classification operator from input space to output space of classes. A targeted adversarial perturbation to an input has the property that it changes the classification to a specified target class , i.e and . Herein we follow the formulation by (Alzantot). Given: an image X, a maximum energy budget , and a suitable loss function , then the task of computing the adversarial perturbation can be cast as an optimisation problem such as
where the final two inequality constraints are due to the input entries being restricted to . Denoting the pre-classification output vector by , i.e. , then the misclassification of X to target label is achieved by if . In (carlini; chen; Alzantot) they determined
to be the most effective loss function for computing in (1), and we also employ this choice throughout our experiments.
3 Derivative Free Optimisation for Adversarial Examples
Derivative Free Optimisation is a well developed field with numerous types of algorithms, see (conn2009introduction) and (larson_menickelly_wild_2019) for reviews on DFO principles and algorithms. Examples of classes of such methods include: direct search methods such as simplex, model-based methods, hybrid methods such as finite differences or implicit filtering, as well as randomized variants of the aforementioned and methods specific to convex or noisy objectives. The optimization formulation in Section 2 is amenable to virtually all DFO methods, making it unclear which of the algorithms to employ. Methods which have been trialled include: the finite difference based ZOO attack (chen), a combinatorial direct search of the perturbation energy constraint method COMBI (COMBI), a genetic direct search method GenAttack (Alzantot), and most recently a randomized direct-search method (andriushchenko2019square). Notably missing from the aforementioned list are model-based methods.
Given a set of samples with , model-based DFO methods start by identifying the minimiser of the objective among the samples at iteration , . Following this, a model for the objective function is constructed, typically centred around the minimizer. In its simplest form one uses a polynomial approximation to the objective, such as a quadratic model centred in
with , c, , and being also symmetric. In a white-box setting one would set and , but this is not feasible in the black-box setting as we do not have access to the derivatives of the objective function. Thus c and M
are usually defined by imposing interpolation conditions
and when (i.e. the system of equations is under-determined) other conditions are introduced according to which algorithm is considered. The objective model (3
) is considered to be a good estimate of the objective in a neighbourhood referred to as a trust region. Once the modelis generated, the update step p is computed by solving the trust region problem
where is the radius of the region where we believe the model to be accurate, for more details see (nocedal). The new point is added to and a prior point is potentially removed. Herein we consider an exemplary111BOBYQA was selected among the numerous types of model-based DFO algorithms due to its efficiency observed for other similar problems requiring few model samples as in climate modelling (Climate) model-based method called BOBYQA.
The BOBYQA algorithm, introduced in (powellbobyqa), updates the parameters of the model and M, in each iteration in such a way as to minimise the change in the quadratic term between iterates while otherwise fitting the sample values:
initialised as the zero matrix. When the number of parametersthen the model is considered as linear with set as zero. We further allow only queries at each implementation of BOBYQA, since after the model is generated few iterations are needed to find the minimum.
3.1 Computational Scalability and Efficiency
For improved computational scalability and efficiency, we do not solve (1) for directly, but instead use domain sub-sampling and hierarchical liftings: domain sub-sampling iteratively sweeps over batches of variables, see (8), while hierarchical liftings clusters and perturbs variables simultaneously, see (12).
The simplest version of domain sub-sampling consists of partitioning input dimension into smaller disjoint domains; for example, domains of size which are disjoint and which cover all of . Rather than solving (1) for directly, for each of one sequentially solves for which are only non-zero for entries in . The resulting sub-domain perturbations are then summed to generate the full perturbation , see Figure 2 as an example. That is, the optimisation problem (1) is adapted to repeatedly looping over :
where the may be reinitialised; in particular following each loop over which occurs at .
We considered three possible ways of selecting the domains
In Random Sampling we consider at each iteration a different random sub-samplings of the domain, .
In Ordered Sampling we generate a random disjoint partitioning of the domain. Once each variable has been optimised over once a new partitioning is generated.
In Variance Sampling we choose
to select in decreasing order of local variance of, the variance in intensity among the 8 neighbouring variables (e.g. pixels) in the same colour channel. We further reinitialise after each loop through .
In Figure 3 we compare how these different sub-sampling techniques perform when generating adversarial example for the MNIST and CIFAR10 dataset. It can be observed that variance sampling consistently performs better than random and ordered sampling. This suggest that pixels belonging to high-contrast regions are more influential than the ones in a low-contrast one, and hence variance sampling is the preferable ordering.
When the domain is very high dimensional, working on single pixels is not efficient as the above described method would imply modifying only a very small proportion of the image; for instance, we will choose even when is almost three-hundred-thousand. Thus to perturb wider portions of the image, we consider a hierarchy of liftings as in the ZOO attack presented in (chen). We seek an adversarial example by optimising over increasingly higher dimensional spaces at each step referred here as level lifted to the image space. As an illustration, Figure 4 shows that hierarchical lifting has a significant impact on the minimisation of the loss function.
At each level we consider a linear lifting and find a level perturbation which is added to the full perturbation , according to
where is initialised as and the level perturbations of the previous layers are considered as fixed. Moreover, we impose that at each level, the grid has to double in refinement, i.e. . An example of how this works is illustrated in Figure 5.
When generating our adversarial examples, we considered two kind of liftings. The first kind of liftings is based on interpolation operations; a sorting matrix is applied such that every index of is uniquely associated to a node of a coarse grid masked over the original image. Afterwards, an interpolation is implemented over the values in the coarse grid, i.e. . The second kind of liftings, instead, forces the perturbation to be high-frequency since there is several literature on these perturbations being the most effective (guo2018; gopalakrishnan2018toward; sharma2019effectiveness). Some preliminary results lead us to consider the “Block” lifting which considers a piecewise constant interpolation and corresponds to the one also used in (COMBI). Alternative piecewise linear or randomised orderings were also tried, but found not to be appreciably better to justify the added complexity. As we show for the example in Figure 6, this interpolation lifting divides an image in disjoint blocks via a coarse grid and associates to each of the blocks the same value of a parameter in . We characterise the lifting with the following conditions
Since may still be very high (usually ), for each level we apply domain sub-sampling and consider . We order the blocks according to the variance of mean intensity among neighbouring blocks, in contrast to the variance within each block which was suggested in (chen). Consequently, at each level the adversarial example is found by solving the following iterative problem
In its simplest formulation, hierarchical lifting struggles with the pixel-wise interval constraint, . To address this we allow the entries in to exceed the interval and then reproject the pixel-wise entries into the interval.
3.2 Algorithm pseudo-code
Our BOBYQA based algorithm is summarised in Algorithm 1; note that not using the hierarchical method corresponds to having one level with . A Python implementation of the proposed algorithm based on BOBYQA package from (cartis) is available on Github222https://github.com/giughi/A-Model-Based-
4 Comparison of Derivative Free Methods
We compare the performance of our BOBYQA based algorithm to GenAttack (Alzantot), combinatorial attacks COMBI (COMBI) and SQUARE (andriushchenko2019square). The performance is measured by considering the distribution of queries needed to successfully find adversaries to different networks trained on three standard datasets: MNIST (lecun-mnisthandwrittendigit-2010), CIFAR10 (CIFAR), and ImageNet (imagenet_cvpr09).
4.1 Parameter Setup for Algorithms
Our experiments rely for GenAttack (Alzantot), COMBI (COMBI), and SQUARE (andriushchenko2019square) on publicly available implementations333GenAttack: https://github.com/nesl/adversarial_genattack
with same hyperparameter setting and hierarchical approach as suggested by the respective authors.
For the proposed algorithm based on BOBYQA, we tuned three main parameters: the dimension of the initial set , the batch dimension , and the trust region radius.
Batch Dimension Figure 7 shows the loss value averaged over 20 images for attacks to NNs trained on CIFAR10, and ImageNet datsets when different batch dimensions are chosen. The average objective loss as a function of network queries is largely insensitive to the batch sizes, but with modest differences for the larger ImageNet data set where was observed to require modestly fewer queries. For the remained of the simulations we use as a good trade-off between faster model generation and good performances.
Initial Set Dimension Once a subdomain of dimension is chosen, the model (3) is initialised with a set of samples on which the interpolation conditions (4) are imposed. There are two main choices for the dimension of the set: either , thus computing and c with the interpolation and leaving M always null and thus having a linear model, or which allows us to initialise , and the diagonal of M, hence obtaining a quadratic model. The results in Figure 8 show that at each iteration of the domain sub-sampling the quadratic method performs as well as a linear method, however it requires more queries to initialise the model. Thus we consider the linear model with 444The Constraint Optimisation by Linear Approximation (COBYLA), a linear based model DFO algorithm, was introduced before BOBYQA (powell2007view); however, COBYLA considers different constraints on the norm of the variable. Because of this and the possibility to extend the method to quadratic models, we name our algorithm after BOBYQA..
Trust Region Radius Once the model for the optimisation is built, the step of the optimisation is bounded by the trust region radius. We have selected the beginning radius to be one third of the whole space in which the perturbation lies. With this choice of radius we usually reach within 5 steps a corner of the boundary, and the further iterates remain effectively stationary.
For the hierarchical lifting approach we consider an initial sub-domain of dimension , as this is the biggest grid that we can optimise over with a batch . After considering , we make use of and do not consider further levels.
4.2 Dataset and Neural Network Specifications
Experiments on each dataset are performed with one of the best performing NN architectures as described below
MNIST and CIFAR10 are two data-sets with images divided between 10 classes and of dimension 28x28x1 and 32x32x3 respectively. On them we apply the net introduced in (chen)
which is structured in succession by: 2 Conv layers with ReLu activation followed by a maxpooling layer. This process is repeated twice and then two dense layers with Relu activation are applied. Finally a softmax layer generates the output vector. For each dataset, we train the same architecture in two different ways obtaining separate nets. One is obtained by optimising the accuracy of the net on raw unperturbed images, while the other is trained with the application of the distillation defence by(Papernot_distillation).
To generate a comprehensive distribution for the queries at each energy budget, for both the two trained nets and 10 images per class, we attempt to misclassify an image targeting all of the 9 remaining classes; this way we generate a total of 900 perturbations per energy budget. For these two datasets the images are of relative low dimension and we do not apply the hierarchical approach.
This is a data-set of millions of images with a dimension of 299x299x3 divided between 1000 classes. For this data-set we consider the Inception-v3 net (Inceptionv3) trained with and without the adversarial defence proposed in (kurakin2016adversarial)555For the non-adversarially trained net we considered the one available at http://jaina.cs.ucdavis.edu/datasets/adv/imagenet/inception_v3_2016_08_28_frozen.tar.gz, while for the weights of the adversarially trained net we relied on https://github.com/tensorflow/models/tree/master/research/adv_imagenet_models.. Due to the large number of target classes in ImageNet, we perform tests on random images and target classes. The number of tests conducted for Inception-v3 (Inceptionv3) and the adversarially trained variant (kurakin2016adversarial) are: 303 and 120 for , 155 and 114 for and 149 and 116 for respectively.
4.3 Experimental Results
In Figure 9 we present the cumulative fraction of images misclassified (abridged by CDF for cumulative distribution function) as a function of the number of queries to the NN for different perturbation energies . The pixels are normalised to be in the interval , hence, would imply that any pixel is allowed to change of the total intensity range from its initial value. By illustrating the CDFs we easily see which method has been able to misclassify the largest fraction of images in the given test-set for a fixed number of queries to the NN. It can be observed that the proposed BOBYQA based approach achieves state-of-the-art results when the perturbation bound of decreases. This behaviour is consistent across all of the considered datasets (MNIST, CIFAR10, and ImageNet); however, the energy at which the BOBYQA algorithm performs the best, varies in each case.
In the experiments we also considered nets trained with defence methods, distillation (Papernot_distillation) for MNIST and CIFAR10 datasets while adversarial training (kurakin2016adversarial) for ImageNet, and the results can be identified in Figure 9 by the dashed lines. Similar to the previous case, we observe that the proposed BOBYQA based algorithm performs the best when the energy perturbation decreases. Moreover, the BOBYQA based algorithm seems to be the least affected in its performance when the any defence is used; for example, at 0.01 and 15,000 queries, the defence reduces the CDF of COMBI by 0.078 compared to 0.051 for BOBYQA. This further supports the idea that for more challenging scenarios model-based approaches are preferable as compared to model-free counterparts.
We associate the counter-intuitive improvement of the CDF in the MNIST and ImageNet with high perturbation energies cases to the distillation and the adversarial training being focused primarily on low energy perturbations. For ImageNet, non-model-based algorithms use different hierarchical approaches which we expect leads in part to the superior performance of COMBI in Fig. 9 panels (g)-(i).
5 Discussion and Conclusion
We have introduced BOBYQA, a method to search adversarial examples based on a model-based DFO algorithm and have conducted some experiments to understand how it compares to existing GenAttack (Alzantot), COMBI (COMBI), and SQUARE (andriushchenko2019square) attack, when targeted black-box adversarial examples are searched with the fewest queries to a neural net.
Following the results of the experiments that we presented above, the method with which generating the adversarial example should be chosen according to which setting the adversary is considering. When the perturbation energy is high, one should choose either COMBI if the input is high-dimensional or SQUARE if the input is low-dimensional. On the other hand, a model-based approach like BOBYQA should be considered as soon as the complexity of the setting increases, e.g. the maximum perturbation energy is reduced or the net is adversarially trained.
With the BOBYQA attack algorithm we have introduced a different approach for the generation of targeted adversarial examples in a black-box setting with the aim of exploring what advantages are achieved by considering model-based DFO algorithms. We did not focus on presenting an algorithm which is in absolute the most efficient; primarily because our algorithm has several aspects in which to be improved. The BOBYQA attack is limited by the implementation of py-BOBYQA (cartis) since the element-wise constraints do not allow the consideration of more sophisticated liftings which leverage on compressed sensing, to name one of the many possible variations.
In conclusion, the results in this paper support how sophisticated misclassification methods are preferable in challenging settings. As a consequence, variations on our model-based algorithms should be considered in the future as a tool to establish the effectiveness of newly presented adversarial defence techniques.
This publication is based on work supported by the EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration with New Rock Capital Management.