It has been discovered that deep neural networks (DNNs) are susceptible to adversarial examples [szegedy2014intriguing], which are images indistinguishable to humans that cause a DNN to give substantially different results. There is a large body of research on attacking (i.e., finding such adversarial examples) and defending neural networks [carlini2017towards, athalye2018obfuscated, madry2018towards, kannan2018adversarial, tramer2020adaptive]. The robustness of a network on a specific input can be formally proved by adopting exact verification techniques, mostly based on Satisfiability Modulo Theories (SMT) solvers [scheibler2015towards, huang2017safety, katz2017reluplex, ehlers2017formal]
or Mixed Integer Linear Programming (MILP)[lomuscio2017approach, cheng2017maximum, fischetti2018deep, dutta2018output, tjeng2018evaluating, yang2019correctness]. An exact verification (a.k.a. complete verification) algorithm should either prove that a specification is satisfied or produce a counter example that violates the specification. Another approach to provable robustness is via inexact verification (a.k.a. certification) based on the idea of over-approximation, that may fail to prove or disprove the desired properties in certain cases [wong2017provable, weng2018towards, gehr2018ai2, zhang2018efficient, raghunathan2018semidefinite, dvijotham2018training, mirman2018differentiable, singh2019an].
In practice the computation of both the verifier and the neural network is performed on physical computers that use an approximated representation of real numbers. Although it is theoretically possible to use exact arithmetic for DNN verification, to achieve reasonable performance, the solvers use floating point arithmetic [katz2017reluplex]. The use of floating point arithmetic renders the solver inexact. The existence of multiple software and hardware systems for DNN inference, such as accelerated convolution algorithms of FFT [vasilache2015fast] or Winograd [lavin2016fast], further complicates the situation, because different implementations exhibit different error characteristics. We are not aware of any efforts to ensure that the solver error is completely aligned with the error of a particular inference implementation.
Floating point error has been accounted for in some certification frameworks [singh2018fast] by maintaining upper and lower rounding bounds for sound floating point arithmetic [mine2004relational]. Such frameworks should be extensible to model numerical error in implementations like Winograd convolution but the effectiveness of this extension remains to be studied. And these techniques only aspire to verify an over-approximation of robustness.
On the contrary, very little attention has been paid to floating point error in the literature of exact verification of neural networks. In this work, we argue that this issue should not be overlooked by presenting practical techniques to construct adversarial examples for verified neural networks.
2 Problem Definition
Let denote the computation of a neural network with weight . In this work we consider 2D image classification problems. For an input image which has rows and columns of pixels each containing color channels normalized to the rangeis a vector containing the classification scores for each of the classes. The class with the highest score is the classification result of the neural network.
For a logits vector and a target class number , we define the Carlini-Wagner (CW) loss [carlini2017towards] as the score of the target class subtracted by the maximal score of other classes:
Therefore the neural network classifiesas an instance of class if and only if , assuming no two classes have equal scores.
Adversarial robustness of a neural network is defined for an input and a perturbation bound , such that the classification result is stable within allowed perturbations:
In this work we focus on -bounded perturbations:
Exact verification checks for given and whether (2)
holds. We adopt the small convolutional neural network architecture from[xiao2018training] and the
MIPVerifyverifier of [tjeng2018evaluating]
, since they are both open source and deliver the fastest exact verification that we are aware of for real-valued neural networks. In
MIPVerifythe computation of is formulated as a set of MILP constraints. The objective can be encoded in two ways:
Closest: checking if under the constraint and .
Worst: checking if under the constraint .
The robustness condition of (2) is satisfied if and only if the solver reports that the MILP formulation is infeasible. An adversarial example can be extracted from a feasible solution.
Due to the inevitable presence of numerical error in both the network inference system and the MILP solver, the exact specification (i.e., a bit-level accurate description of the underlying computation) of is not clearly defined in (2)
. We consider the following implementations that are accessible from the PyTorch[NEURIPS2019_9015] framework:
: A matrix multiplication based implementation on x86/64 CPUs. Note that [xiao2018training]
reformulates convolutions in the TensorFlow[abadi2016tensorflow] model as matrix multiplications when invoking the verifier in their open source code.
: The default convolution implementation on x86/64 CPUs.
: A matrix multiplication based implementation on NVIDIA GPUs.
: A convolution implementation using the
IMPLICIT_GEMMalgorithm from the cuDNN library [chetlur2014cudnn] on NVIDIA GPUs. It has similar numerical error as CPU algorithms.
: A convolution implementation using the
WINOGRAD_NONFUSEDalgorithm from the cuDNN library [chetlur2014cudnn] on NVIDIA GPUs. It is based on the Winograd [lavin2016fast] fast convolution algorithm, which has much higher numerical error compared to others.
All these implementations use single precision floating point arithmetic for inputs and outputs of each layer while exhibiting different numerical error behavior. We present in the following section our techniques for constructing adversarial examples for all of them. Specifically, for a given implementation , our algorithm finds pairs of represented as single precision floating point numbers such that
MIPVerifyverifier claims that (2) holds for
Note that the first two conditions can be accurately specified for an
implementation compliant with the IEEE-754 standard, because the computation
only involves element-wise subtraction and max-reduction which incurs no
MIPVerify adopts a commercial solver
Gurobi [gurobi] for solving the MILP problem with double precision.
Therefore to ensure that our adversarial examples satisfy the constraints
considered by the solver, we also require that the first two conditions hold for
and that are double
precision representations of and .
3 Constructing Adversarial Examples
3.1 Empirical Characterization of Implementation Numerical Error
To guide the design of our attack algorithm we present statistics about numerical error of different implementations.
To investigate local error behavior, we select the first robust test image
MIPVerify on MNIST and present in Figure 1 a plot
, where the addition of is only applied
on the single input element that has the largest gradient magnitude. We observe
that the change of the logits vector is highly nonlinear with respect to the
change of the input, and a small perturbation could result in a large
WINOGRAD_NONFUSED algorithm on NVIDIA GPU is much more
unstable and its variation is two magnitudes larger than the others.
We also evaluate all the implementations on the whole MNIST test set and compare the outputs of the first layer (i.e., with only one linear transformation applied to the input) against that of, and present the histogram in Figure 2. It is clear that different implementations usually have different error behavior, and again induces much higher numerical error than others.
These observations inspire us to construct adversarial images for each implementation independently by applying small random perturbations on an image close to the robustness decision boundary. We present the details of our method in Section 3.2.
3.2 A Method for Constructing Adversarial Examples
Given a network and weights , a natural image from the test set that is provably robust under -bounded perturbations, and an actual implementation of the network , we construct an adversarial input pair in three steps:
We search for a coefficient such that is provably robust but is not robust for a small positive value . Thus is still a naturally looking image because is roughly adjusting brightness of the image, but is close to the decision boundary of robustness. In an ideal situation a binary search is sufficient to minimize , but we find that in many cases the MILP solver becomes extremely slow when is small enough, so we start with a binary search and switch to a grid search if the solver exceeds a time limit. We set the target of to be in our experiments and divide the best known to intervals if grid search is needed.
We find an adversarial image that is close to the robustness decision boundary by using the MILP solver and a binary search to find , and such that:
is still robust given a tolerance of :
is an adversarial image given a tolerance of :
and are close:
For an ideal solver we could simply find the values by minimizing with the worst objective without the need of running a binary search. However in practice we have observed that doing a binary search is more reliable in finding close pairs due to the numerical error in the solver. Another issue is that the solver exceeds the time limit in some cases. We simply discard if this happens at the beginning of the binary search, and abort the search process if this happens later. We call the tuple that can be found within the time limit a quasi-adversarial input.
We apply small random perturbations (on the scale of ) on while projecting back to . If in step 1 is small enough, then should be very close to zero and there is a good chance that such a perturbation results in a negative CW loss, giving us the desired adversarial example. Algorithm 1 presents the details of applying such perturbations.
We conduct our experiments on a workstation equipped with two GPUs (NVIDIA Titan RTX and NVIDIA GeForce RTX 2070 SUPER), 128 GiB of RAM and an AMD Ryzen Threadripper 2970WX 24-core processor. We train the small architecture from [xiao2018training] with the PGD adversary and the RS Loss on MNIST and CIFAR10 datasets. The trained networks achieve 94.63% and 44.73% provable robustness with perturbations of norm bounded by and on the two datasets respectively, similar to the results reported in [xiao2018training]. All of our code is available at https://github.com/jia-kai/realadv.
|#quasi-adv / #tested|
|MNIST||18 / 32||2||3||1||3||7|
|CIFAR10||26 / 32||16||12||7||6||25|
|min test acc||98.40%||97.83%||98.57%||97.83%||97.83%|
|min test acc||58.74%||58.74%||58.74%||58.74%||58.74%|
Since our method needs multiple invocations of the computationally intensive verifier, for each dataset we only test it on images randomly sampled from the robust test images to demonstrate its effectiveness within a reasonable total running time. The time limit of MILP solving is 360 seconds. Out of these images, we have successfully found quasi-adversarial inputs for 18 images on MNIST and 26 images on CIFAR10. We apply random perturbations outlined in Algorithm 1 to these quasi-adversarial inputs ( from Section 3.2 Step 2) to obtain adversarial inputs for the verified robust input image ( from Section 3.2 Step 1). All the implementations that we have considered are successfully attacked. We present the detailed numbers in Table 1.
Furthermore, since a large in the quasi-adversarial inputs makes our method less likely to succeed and it can result from a large value of
in step 1 due to the solver exceeding the time limit, we also evaluate the performance of our attack algorithm when modifying the bias of the softmax layer is allowed. We decrease the bias of the target class byto obtain a model that is still verifiably robust but easier to be attacked. Since modifying the bias would affect test accuracy, we also evaluate the minimal test accuracy of modified models of the successful attacks for each implementation. The results are presented in Table 2, showing that attack success rate is improved. Therefore our method can be more effective if the verifier could be more efficient for images close to robustness boundaries.
We also present in Figure 3 the verified robust images on which our attack method succeeds for all implementations and the corresponding adversarial images.
4 Possible Resolutions
Unless the verifier is able to model an exact specification of floating point arithmetic in the neural network inference implementation, verification of NNs in the strictly exact sense is impossible in the presence of numerical error. Moreover we have shown that because of numerical error, practical adversarial examples can be constructed for verified robust networks. We propose two possible solutions:
Adopting over-approximation that is close to exact verification. For example the verifier could add an error term bounded by maximum implementation error to variables for all the hidden units, similar to the approach adopted by certification frameworks with sound floating point arithmetic [singh2018fast, singh2019an]
. However this may significantly complicate the verification problem. It is also not clear how to estimate a tight error bound for a practical implementation with many engineering optimizations.
Quantizing the computations so the verifier and the implementation can be aligned. For example, if we require all activations to be multiples of and all weights to be multiples of , where and is a very loose bound of possible implementation error, then the output can be rounded to multiples of
to completely eliminate numerical error. Binarized neural networks[hubara2016binarized] are a family of extremely quantized networks, and their verification [narodytska2018verifying, shih2019verifying] can be easily aligned with any sane implementation.