Semantify-NN
Code for "Towards Verifying Robustness Of Neural Networks Against A Family Of Semantic Perturbations" (CVPR '20)
view repo
Verifying robustness of neural networks given a specified threat model is a fundamental yet challenging task. While current verification methods mainly focus on the L_p-norm-ball threat model of the input instances, robustness verification against semantic adversarial attacks inducing large L_p-norm perturbations such as color shifting and lighting adjustment are beyond their capacity. To bridge this gap, we propose Semantify-NN, a model-agnostic and generic robustness verification approach against semantic perturbations for neural networks. By simply inserting our proposed semantic perturbation layers (SP-layers) to the input layer of any given model, Semantify-NN is model-agnostic, and any L_p-norm-ball based verification tools can be used to verify the model robustness against semantic perturbations. We illustrate the principles of designing the SP-layers and provide examples including semantic perturbations to image classification in the space of hue, saturation, lightness, brightness, contrast and rotation, respectively. Experimental results on various network architectures and different datasets demonstrate the superior verification performance of Semantify-NN over L_p-norm-based verification frameworks that naively convert semantic perturbation to L_p-norm. To the best of our knowledge, Semantify-NN is the first framework to support robustness verification against a wide range of semantic perturbations.
READ FULL TEXT VIEW PDFCode for "Towards Verifying Robustness Of Neural Networks Against A Family Of Semantic Perturbations" (CVPR '20)
As deep neural network (DNN) is becoming prevalent in machine learning and achieves the best performance in many standard benchmarks, their unexpected vulnerability to adversarial examples has spawned a wide spectrum of research disciplines in adversarial robustness, spanning from effective and efficient methods to find adversarial examples for causing model misbehavior (i.e., attacks), to detect adversarial inputs and become attack-resistant (i.e., defenses), and to formally evaluate and quantify the level of vulnerability of well-trained models (i.e., robustness verification or certification).
Given a data sample and a trained DNN, the primary goal of verification tools is to provide a “robustness certificate” for verifying its properties in a specified threat model. For image classification tasks, the commonly used threat model is an -norm bounded perturbation to , where usually takes the value , approximating the similarity measure of visual perception between and its perturbed version. The robustness property to be verified is the consistent decision making of the DNN of any sample drawn from an -norm ball centered at with radius . In other words, verification methods aim to verify whether the DNN can give the same top-1 class prediction to all samples in the -ball centered at . Note that verification is attack-agnostic as it does not incur any attacking methods for verification. Moreover, if the -ball robustness certificate for is verified, it assures no adversarial attacks using the same threat model can alter the top-1 prediction of the DNN for . Although finding the maximal verifiable value (i.e. the minimum distortion) is computationally intractable for DNNs [katz2017reluplex], recent verification methods have developed efficient means for computing an lower bound on minimum distortion as a verifiable -ball certificate [kolter2017provable, wong2018scaling, weng2018towards, dvijotham2018dual, zhang2018crown, singh2018fast, wang2018efficient, raghunathan2018semidefinite, Boopathy2019cnncert].
Beyond the -norm bounded threat model, recent works have shown the possibility of generating semantic adversarial examples based on semantic perturbation techniques such as color shifting, lighting adjustment and rotation [hosseini2018semantic, liu2018beyond, bhattad2019big, joshi2019semantic, fawzi2015manitest, engstrom2017rotation]. Notably, although semantically similar, these semantic adversarial attacks essentially consider different threat models than -norm bounded attacks. Therefore, semantic adversarial examples usually incur large -norm perturbations to the original data sample and thus exceed the verification capacity of -norm based verification methods. To bridge this gap and with an endeavor to render robustness verification methods more inclusive, we propose Semantify-NN, a model-agnostic and generic robustness verification against semantic perturbations. Semantify-NN is model-agnostic because it can apply to any given trained model by simply inserting our designed semantic perturbation layers (SP-layers). It is also generic since after adding SP-layers, one can apply any -norm based verification tools for certifying semantic perturbations. In other words, our proposed SP-layers work as a carefully designed converter that transforms semantic threat models to -norm threat models. As will be evident in the experiments, Semantify-NN yields substantial improvement over -norm based verification methods that directly convert semantic adversarial example to the equivalent norm perturbation.
We summarize the main contributions of this paper as follows:
[leftmargin=*]
We propose Semantify-NN, a model-agnostic and generic robustness verification toolkit for semantic perturbations to neural network’s inputs. Semantify-NN can be viewed as a powerful extension module consisting of novel semantic perturbation layers (SP-layers) and is compatible to -norm based verification tools. To the best of our knowledge, Semantify-NN is the first framework to support robustness verification against a wide range of semantic perturbations.
We elucidate the design principles of our proposed SP-layers for a variety of semantic attacks, including hue/saturation/lightness change in color space, brightness and contrast adjustment, rotation, translation and occlusion. We also propose to use input space refinement and splitting methods to further improve the performance of robustness verification. We also illustrate that the need and importance of robustness certification for continoulusly parameterized perturbations due to the real-value nature.
Our extensive experiments evaluated on the rich combinations of three datasets (MNIST, CIFAR-10 and GTSRB) and five different network architectures (MLPs and CNNs) corroborate the superior verification performance of Semantify-NN over naive -norm based verification methods. In particular, our method without further refinement can already achieve around 2-3 orders of magnitude larger (tighter) semantic robustness certificate than the baselines that directly uses the same -norm verification methods to handle semantic perturbations. With the proposed refinement technique, the semantic robustness certificate can be further improved by 100-300%.
For -norm bounded threat models, current robustness verification methods are mainly based on solving a computationally affordable dual optimization problem [kolter2017provable, wong2018scaling, dvijotham2018dual]
, devising tractable bounds on activation functions and layer propagation
[singh2018boosting]. We refer readers to the prior arts and the references therein for more details. The work in [wang2018efficient] considers brightness and contrast in the linear transformation setting, which still falls under the norm threat model. The work in [singh2019abstract] has scratched the surface of semantic robustness verification by considering rotations attacks with the -norm based methods; however, we show that with our carefully designed refinement techniques, the robustness certificate can be significantly improved around 50-100% in average. Moreover, we highlight that in this work, we consider a more general and challenging setting than [wang2018efficient] where the color space transformation can be non-linear and hence directly applying -norm based method could result in a very loose semantic robustness certificate. On the other hand, the work in [hamdi2019towards] proposes to use semantic maps to evaluate semantic robustness, but how to apply this analysis to develop semantic robustness verification is beyond its scope.In general, semantic adversarial attacks craft adversarial examples by tuning a set of parameters governing semantic manipulations of data samples, which are either explicitly specified (e.g. rotation angle) or implicitly learned (e.g. latent repsentations of generative models). In [hosseini2018semantic]
, the HSV (hue, saturation and value) representation of the RGB (red, green and blue) color space is used to find semantic adversarial examples for natural images. To encourage visual similarity, the authors propose to fix the value, minimize the changes in saturation, and fully utilize the hue changes to find semantic adversarial examples. In
[liu2018beyond], the authors present a physically-based differentiable renderer allowing propagating pixel-level gradients to the parametric space of lightness and geometry. In [bhattad2019big], the authors introduce texture and colorization to induce semantic perturbation with large
norm perturbation to the raw pixel space while remaining visual imperceptibility. In [joshi2019semantic], an adversarial network composed of an encoder and a generator conditioned on attributes are trained to find semantic adversarial examples. In [fawzi2015manitest, engstrom2017rotation], the authors show that simple operations such as image rotation or object translation can result in a notable misclassification rate.Note that for the semantic perturbations that are continuously parameterized (such as Hue, Saturation, Lightness, Brightness, Contrast and Rotations that are considered in this paper in the second half part in Sec 3.2), it is not possible to enumerate all possible values even if we only perturb one single parameter. The reason is that these parameters take real values in the continuous space, hence it is not possible to finitely enumerate all possible values and thus the attack cannot be enumerated, unlike its discrete parameterized counterpart (e.g. translations and occlusions have finite enumerations). Take the rotation angle for example, the attacker can try to do a grid search by sweeping rotation angle from say to with a uniform grids. However, if the attacks are not successful at and , it does not eliminate the possibility that there could exist some
that could fool the classifier where
. This is indeed the motivation and necessity to have the robustness certification algorithm for semantic perturbations as proposed in this paper – with robustness certification algorithm, we can guarantee that the neural network can never be fooled for , where is the robustness certificate delivered by our algorithm.Here we formally introduce Semantify-NN, our proposed verification framework for semantic perturbations. We begin by providing a general problem formulation of attack manipulation in the input space, followed by its specifications on different types of semantic perturbations. Moreover, motivated by the refinement strategy for efficient verification of norm threat models in [wang2018efficient], we also elucidate refinement strategy for Semantify-NN to further boost its verification performance.
In order to formalize the notion of a general threat model , for an input data sample , we define an associated space of perturbed images, denoted as the Attack Space equipped with a distance function to measure the magnitude of the perturbation. Under this definition, the certification problem for a -class neural network classifier , and an input data sample , under the threat model is as follows: we want to find the largest such that
(1) |
where
denotes the confidence (or logit) of the
-th class and .In this work, we consider semantic threat models that target semantic meaningful attacks, which are usually beyond the coverage of conventional -norm bounded threat models. For ease of illustration, we use digital images as data samples. More formally, we deal with the family of attacks such that if , then the associated space satisfies the following property:
[leftmargin=*]
There exists a function such that
(2) | |||
(3) |
where is the subspace of pixel space (the raw RGB input) and the notation denotes a set of feasible semantic operations. The parameters specifies semantic operations selected from . For example, can describe some human-interpretable characteristic of the image, such as translations shift, rotation angle, etc. The notation denotes norm. For convenience we define and where denotes the dimension of the semantic attack. In other words, we show that it is possible to define an explicit function for all the semantic perturbations considered in this work, including translations, occlusions, color space transformations, and rotations, and we then measure the norm of the semantic perturbations on the space of semantic features rather than the raw pixel space. Notice that the conventional norm perturbations on the raw RGB pixels is a special case under this definition: by letting equal to a bounded real set (i.e. , all possible difference between -th pixel) and
be the dimension of input vector
, we recover .Based on the definition above, semantic attacks can be divided into two categories: discretely paramterized perturbations (i.e. is a discrete set) including translation and occlusions in Section 3.2 and continuously parameterized perturbations (i.e. is a continuous set) including color space transformation, brightness and constrast, and rotations in Section 3.2.
Using the position of the left-uppermost pixel as the reference, we can write translation as -dimensional semantic attack. The parameters are the relative position of left-uppermost pixel of current image to the original image. Therefore, we have where
are the dimensions of width and height of our input image. Note that any padding methods can be applied here, for example, it can be padded by the black pixels or repeated with the boundaries, etc.
Similar to translation, in the case of occlusion we have -dimensional parameters being the location of the left-uppermost pixel of the occlusion patch and the occlusion patch size.
Note that in both of these cases, provided with sufficient computation resources, one could simply exhaustively enumerate all the possible perturbed images. At the scale of our considered image dimensions, we find that exhaustive enumeration can be accomplished within a reasonable computation time and the generated images can be used for direct verification. In this case, the SP-layers are reduced to enumeration operations given a discretely parameterized semantic attack threat model. Nonetheless, the computation complexity of exhaustive enumeration grows combinatorially when considering a joint attack threat model consisting of multiple types of discretely parameterized semantic attacks.
Most of the semantic perturbations fall under the framework where the parameter are continuous values, i.e., . We propose the idea of adding semantic perturbation layers (SP-layers) to the input layer of any given neural network model for efficient robustness verification, as illustrated in Figure 1. By letting , the verification problem for neural network formulated in (1) becomes
(4) |
If we consider the new network as , then we have the following problem:
(5) |
which has a similar form to -norm perturbations but now on the semantic space . The proposed SP-layers allow us to explicitly define the dimensionality of our perturbations and put explicit dependence between the manner and the effect of the semantic perturbation on different pixels of the image. In other words, one can view our proposed SP-layers as a parameterized input transformation function from the semantic space to RGB space and is the perturbed input in the RGB space which is a function of perturbations in the semantic space. Our key idea is to express in terms of commonly-used activation functions and thus is in the form of neural network and can be easily incorporated into the original neural network classifier . Note that can be arbitrarily complicated to allow for general transformations for SP-layers; nevertheless, it does not result in any difficulties to apply the conventional -norm based methods such as [zhang2018crown, wang2018efficient, singh2018fast, Boopathy2019cnncert, weng2018towards] as we only require the activation functions to have custom linear bounds and do not need them to be continuous or differentiable. Below we specify the explicit form of SP-layers corresponding to five different semantic perturbations using (i) hue, (ii) saturation, (iii) lightness, (iv) brightness and contrast, and (v) rotation.
We consider color transformations parametrized by the hue, saturation and lightness (HSL space). Unlike RGB values, HSL form a more intuitive basis for understanding the effect of the color transformation as they are semantically meaningful. For each of the basis, we can define the following functions for :
[leftmargin=6mm]
Hue This dimension corresponds to the position of a color on the color wheel. Two colors with the same hue are generally considered as different shades of a color, like blue and light blue. The hue is represented on a scale of - which we have rescaled to the range for convenience. Therefore, we have with , and are functions of independent of and
where
For
the above can be reduced to the following in the ReLU form (
ReLu) and hence can be seen as one hidden layer with ReLU activation connecting from hue space to original RGB space:(6) |
Saturation This corresponds to the colorfulness of the picture. At saturation we get grey-scale images while at a saturation of we see the colors pretty distinctly. where and are functions of and
(7) |
Lightness This property corresponds to the perceived brightness of the image where a lightness of gives us white and a lightness of gives us black images. where and are functions of and
(8) |
We also use the similar technique as HSL color space for some multi-parameter transformations like brightness and contrast. In order to characterize an image perturbed in brightness space by and in the contrast space by , we use the following function
(9) |
Therefore, for brightness and contrast attack we show that we can represent as one additional ReLU layer before the original network model since can be expressed as a linear combination of ReLU activations, which is then realized through our proposed SP Layers in Figure 1.
For rotation, we have a -dimensional semantic attack parametrized by the rotation angle
. Here we consider rotations at the center of the image with the boundaries being extended to the area outside the image. We use the following interpolation to get the values
of output pixel at position after rotation by . Let , then(10) |
where ranges over all possible values. For individual pixels at position of the original image, the scaling factors for its influence on output pixel at position is given by the function,
(11) |
which is highly non-linear. It is for most and for a very small range of , it takes non-zero values which can go up to . Thus, it makes naive verification infeasible. One way to solve it is to split the input the range of theta into smaller parts and certify all parts. In smaller ranges the bounds are tighter. However, the downside is the required number of splits may become too large, making it computationally infeasible. To balance this trade-off, in what follows we propose another form of refinement, which we call implicit input splitting.
Here we discuss several refinement techniques for Semantify-NN in the input space of data samples, which can lead to a substantial boost in its verification performance.
In order to better handle highly non-linear functions that might arise from the general activation functions in the SP-layers, we propose two types of input-level refinement strategies. For linear-relaxation based verification methods, we show the following Theorem holds. The proof is given in the appendix.
If we can verify that a set of perturbed versions of an image are correctly classified for a threat model using one certification cycle, then we can verify that every perturbed image in the convex hull of is also correctly classified, where we take the convex hull in the pixel space.
Here one certification cycle means one pass through the certification algorithm sharing the same linear relaxation values. Although -norm balls are convex regions in pixel space, other threat models (especially semantic perturbations) usually do not have this property. This in turn poses a big challenge for semantic verification.
For some non-convex attack spaces embedded in high-dimensional pixel spaces, the convex hull of the attack space associated with an image can contain images belonging to a different class (an example of rotation is illustrated in Figure 3 in the appendix). Thus, one cannot certify large intervals of perturbations using a single certification cycle of linear relaxation based verifiers.
As we cannot certify large ranges of perturbation simultaneously, input-splitting is essential for verifying semantic perturbations. It reduces the gap between the linear bounds on activation functions and yields tighter bounds, as illustrated in appendix. We observe that
If holds for both parameter dimensions , then we have . As a result, we can split the original interval into smaller parts and certify each of them separately in order to certify the larger interval. The drawback of this procedure is that the computation time scales linearly with the number of divisions as one has to run the certification for every part. However, for the color space experiments we find that very few partitions are sufficient for good verification performance.
As a motivating example, in Figure 4(a) in the appendix, we give the form of the activation function for rotation. Even in a small range of rotation angle (), the function is quite non-linear resulting in very loose linear bounds. As a result, we find that we are unable to get good verification results for datasets like MNIST and CIFAR-10 without increasing the number of partitions to very large values (). This makes verification methods computationally infeasible. Therefore, we provide another method to get tight bounds. For highly non-linear activation functions, explicit input splitting is not feasible as the maximum interval for which we can provide tight linear bounds is very small, as seen in the case of rotation. Consequently, we propose the following alternatives which called implicit input splitting. The difference between explicit and implicit input splitting methods are illustrated in Figure 4 in the appendix.
[leftmargin=*]
Split the input into
problems and run each of these for SP-layers. This gives explicit upper and lower bounds for the outputs of the neurons of the SP-layers for each split.
Apply certification algorithm on the original network with each weighted perturbations obtained from these explicit splits, i.e., we run this batch of inputs through the certification algorithm simultaneously (the linear relaxations of the intermediate layers use global upper and lower bounds of this batch). This results in a much smaller overhead than explicit splitting as we only need to run the certification algorithm on once instead of times.
Our experiments demonstrate that, for semantic perturbations, the refinement on the input space of the semantic space can significantly boost up the tightness of robustness certificate, as shown in Section 4
. Although additional improvement can be made by refining the pre-activation bounds of each layer through solving a Linear Programming problem or Mixed integer optimization problem similar to the works in
[wang2018efficient] and [singh2018boosting] for the -norm-ball input perturbation, we observed that our proposed approach with input space refinement has already delivered a certified lower bound very close to the attack results (which is an upper bound), suggesting that in this case the feature layer refinement will only have minor improvement while at the cost of much larger computation overhead (grows exponentially with number of nodes to be refined).We conduct extensive experiments for all the continuously-parametrized semantic attack threat models presented in the paper. The verification of discretely-parametrized semantic perturbations can be straightforward using enumeration, as discussed in Section 3.2. By applying our proposed method, Semantify-NN, one can leverage -norm verification algorithms including [weng2018towards, zhang2018crown, wang2018efficient, singh2018fast, Boopathy2019cnncert]. We use verifiers proposed in [zhang2018crown] and [Boopathy2019cnncert]
to certify multilayer perceptron (MLP) models and convolutional neural network (CNN) models as they are open-sourced, efficient, and support general activations on MLP and CNN models.
[leftmargin=*]
Baselines. We calculate the upper bound and lower bound for possible value ranges of each pixel of the original image given perturbation magnitude in the semantic space. Then, we use -norm based verifier to perform bisection on the perturbation magnitude and report its value. It is shown that directly converting the perturbation range from semantic space to original RGB space and then apply -norm based verifiers give very poor results in all Tables. We also give a weighted-eps version where we allow for different levels of perturbation for different pixels.
Attack. We use a grid-search attack with the granularity of the order of the size of the sub-intervals after input splitting. Although this is not the optimal attack value, it is indeed an upper bound for the perturbation. Increasing the granularity would only result in a tighter upper bound and does not affect the lower bound (the certificate we deliver). We would like to highlight again that even though the threat models are very low dimensional, they are continuously parametrized and cannot be certified against by enumeration as discussed in Sec 2.
Semantify-NN: We implement both SP-layers (SPL) and with refinement (SPL+Refine) described in Section 3.3.
Implementations, Models and Datasets. In all of our experiments, we use a custom google cloud instance with 24 vCPUs (Intel Xeon CPU @ 2.30GHz) and 90GB RAM. The SP-layers are added as fully connected layers for MLP’s and as modified convolution blocks for CNN models (we allow the filter weights and biases to be different for different neurons). We evaluate Semantify-NN and other methods on MLP and CNN models trained on the MNIST, CIFAR-10 and GTSRB (German Traffic Sign Benchmark datasets). We use the MNIST and CIFAR models released by [6] and their standard test accuracy of MNIST/CIFAR models are -%/-%. We train the GTSRB models from scratch to have -
% test accuracies. All CNNs (LeNet) use 3-by-3 convolutions and two max pooling layers and along with filter size specified in the description for two convolution layers each. LeNet uses a similar architecture to LeNet-5
[lecun1998gradient], with the no-pooling version applying the same convolutions over larger inputs. We also have two kinds of adversarially trained models. The models (denoted as sem adv in the Table) are trained using data augmentation where we add perturbed images (according to the corresponding threat model) to the training data. The models denoted as adv are trained using the norm adversarial training method [madry2017towards]. We evaluate all methods on 200 random test images and random targeted attacks. We train all models for 50 epochs and tune hyperparameters to optimize validation accuracy.
Table 1 demonstrates that using -norm based verification results in extremely loose bounds because of the mismatch in the dimensionality of the semantic attack and dimensionality of the induced -norm attack. Explicitly introducing this dimensionality constraint by augmenting the neural networks with our proposed SP-layers gives a significant increase in the maximum certifiable lower bound, resulting in larger bounds. However, there is still an apparent gap between the Semantify-NN’s certified lower bound and attack upper bound. Notably, we observe that adding input-space refinements helps us to further tighten the bounds, yielding an extra improvement. This corroborates the importance of input splitting for the certification against semantic attacks. The transformations for HSL space attacks are fairly linear, so the gap between our certified lower bound and attack upper bound becomes quite small.
Table 2 shows results of rotation space verification. Rotation induces a highly non-linear transformation on the pixel space, so we use this to illustrate the use of refinement for certifying such functions. As the transforms are very non-linear, the linear bounds used by our SP-layers are very loose, yielding very small robustness certification. In this case, explicit input splitting is not a computationally-appealing approach as there are a huge amount of intervals to be certified. Table 2 shows how using implicit splits can increase the size of certifiable intervals to the point where the total number of intervals needed is manageably big. At this point we use explicit splitting to get tight bounds. For the results in SPL + Refine, we use intervals of size at a time with 100 implicit splits for each interval. Table 3 shows that using a large number of implicit splits allows us to preserve the quality of the bounds while increasing the size of the explicit interval (effectively reducing runtime for certification without affecting quality).
For multi-dimensional semantic attacks (here a combination attack using both brightness and contrast), we can consider any norm of the parameters to be our distance function. In Figure 2 we show the results for average lower bound for brightness perturbations while fixing the maximum perturbation for contrast parameter () to be 0.01, 0.03 and 0.05.
Table 4 shows that in these experiments, the run-times needed for translation and occlusion to perform exhaustive enumeration are quite affordable. For translation we consider the attack space with and where the images are of shape . The reported values are the average norm values for , the shift vector. For occlusion we consider the attack space with , and where the images are of shape . The reported values are the average norm values for .
In this paper, we propose Semantify-NN, the first robustness verification framework for neural network image classifiers against a broad spectrum of semantic perturbations. Semantify-NN exclusively features semantic perturbation layers (SP-layers) to expand the verification power of current verification methods beyond -norm bounded threat models. Based on a diverse set of semantic attacks, we demonstrate how the SP-layers can be implemented and refined for verification. Evaluated on various datasets, network architectures and semantic attacks, our experiments corroborate the effectiveness of Semantify-NN for semantic robustness verification.
If we can verify that a set of perturbed versions of an image are correctly classified for a threat model using one certification cycle (one pass through the algorithm sharing the same linear relaxation values), then we can verify that every perturbed image in the convex hull of is also correctly classified, where we take the convex hull in the pixel space.
When a set of perturbed inputs and a neural network are passed into a verifier, it produces such that for all
(12) |
We claim that if , then satisfies the above inequality.
We can prove this by induction on the layers. For the first layer we see that as matrix multiplication and addition are linear transformations, we have that lies between the points and . The important property to note here is that every co-ordinate of lies in the interval between the co-ordinates of and . Now, we see that the activation layer is linear relaxed such that for all values of between the upper and lower bound for a neuron. As we proved before every pixel of lies within the bounds and hence satisfies the relation.
For the inductive case, we see that given that satisfies this relation up till layer , then we have that
(13) |
where gives the output of the neuron in layer post-activation.
Now, we see that as we satisfy the above equation, the certification procedures ensure that the newly computed pre-activation values satisfy the same condition. So, we have
where we use to denote the fact that this is a pre-activation bound. Now, if we can show that our value lies within the range of the output of every neuron, then we prove the inductive case. But then we see that as these is a linear transform lies between the points , . So, we see that the values taken by this is lower bounded by the corresponding value taken by at least one of the points in . Similarly we can prove it for the upper bound. Then, we can use the fact that the linear relaxation gives valid bounds for every values within the upper and lower bound to complete the proof. So, we have that
(14) |
∎
Then we see that the verifier only certifies the set to be correctly classified if for all
Now, we see that from the equation above that if , then we have that , where and . Then using the above claim we see that
∎
For some non-convex attack spaces embedded in high-dimensional pixel spaces, the convex hull of the attack space associated with an image can contain images belonging to a different class (an example of rotation is illustrated in Figure 3). Thus, one cannot certify large intervals of perturbations using a single certification cycle of linear relaxation based verifiers.
Consider the images given in Figure 3, denote them as and by construction. We can observe that for an ideal neural network , we expect that classifies as and classifies as . Now, we claim that for this network , it is not possible for a linear-relaxation based verifier to verify that both are classified as using just one certification cycle. If it could, then we have by Theorem A.1 that we would be able to verify it for the point . However, we see that this is not possible as classifies as . Therefore, we need the verification for and for to belong to different certification cycles making input-splitting necessary.
∎
Figure 4 illustrates the difference between explicit and implicit input space splitting. In Figure 4(a), we give the form of the activation function for rotation. Even in a small range of roation angle (), we see that the function is quite non-linear resulting in very loose linear bounds. Splitting the images explicitly into 5 parts and running them separately (i.e. explicit splitting as shown in Figure 4(b)) gives us a much tighter approximation. However, explicit splitting results in a high computation time as the time scales linearly with the number of splits. In order to efficiently approximate this function we can instead make the splits to get explicit bounds on each sub-interval and then run them through certification simultaneously (i.e. implicit splitting as shown in Figure 4(c)). As we observe in Figure 4(c), splitting into 20 implicit parts gives a very good approximation with very little overhead (number of certification cycles used are still the same).