Deep neural networks are vulnerable to adversarial examples(SzegedyAdvExamples) – semantical preserving changes such as -noise, geometrical perturbations (e.g., rotations and translation) (EngstromMadryRotation), and Wasserstein perturbations (WongWasserstein) which can affect the output of the network in undesirable ways. This is especially problematic when these models are used in safety critical tasks such as medical diagnosis (AmatoMedical) or autonomous driving (BojarskiAutonomous).
As a result, recent work (e.g., AI2; Weng2018) started investigating robustness certification methods which guarantee the absence of adversarial examples. However, even with training methods tailored to produce networks amenable to -certification (WongK18; DiffAI), current verification techniques still cannot scale to realistic models and datasets. Recently, a promising approach called randomized smoothing was proposed by (CohenRK19) – it works by constructing a probabilistic classifier with probabilistic certificates and produces state-of-the-art results for -norm bounded noise on ImageNet.
In this work we generalize randomized smoothing to parameterized semantic perturbations (beyond ). For example, our method enables probabilistic certification of geometric perturbations (e.g., rotations, translations), which is challenging due to the need for interpolation and rounding. Prior work on this topic is limited in either expressivity or scalability: PeiCYJ17 is restricted to the nearest neighbor interpolation and exhaustive enumeration, while DeepPoly; DeepG allow more complex interpolations (e.g., bilinear, bicubic) but handle only smaller networks. Our generalization of randomized smoothing overcomes these limitations: it enables certification of geometric perturbations on large networks and with complex interpolations. We illustrate the idea in Fig. 1 where we sample different angles and, by our theorem, obtain a robustness certificate for rotations. Crucially, to be sound, this certificate takes into account the interpolation error, which we overcome by incorporating a -certified classifier.
We remark that to model a realistic attacker, our method also considers quantization errors from limited precision and does not rely on continuous pixel values. This is important as it means the method is sound for pixel values that are integers or floats (which is how images are actually represented).
Our key contributions are:
A generalization of randomized smoothing to parameterized semantic perturbations.
The first scalable and sound certification method for semantic perturbations, such as rotations and translations, that can be applied to ImageNet images. The method is general and works with any standard interpolation (e.g., bicubic, bilinear) and types of pixel values (e.g., integers, floats).
A thorough evaluation of the proposed method on image and audio datasets, establishing state-of-the art results in both domains.
2 Related Work
We now survey the most closely related work in exact and probabilistic certification, defenses as well as perturbations and certification beyond norm-based noise.
norm based certification and defenses
The discovery of adversarial examples (SzegedyAdvExamples; BiggioAdvExamples) triggered interest in training robust neural networks. Empirical defenses are a common way to harden a model against an attacker, by adversarially attacking images during training (AdversarialTraining; MadryTraining). However, while adversarially trained networks may be robust to adversaries, the robustness usually cannot be formally verified with current verification methods. This is because complete methods (Ehlers17planet; Reluplex; bunel18nips) do not scale and non-complete methods relying on over approximation lose too much precision and cannot prove true properties (AI2; WangSafety; Weng2018; RaghunathanSL18a; DeepPoly; SalmanBarrier).
To address this issue, provable training methods have been developed, aimed at producing networks that are amenable for certification (DiffAI; RaghunathanSL18b; WangSafety; WongK18; IBP; Bridging). Currently, these methods do not scale to train large enough networks with state-of-the-art accuracy (e.g., ImageNet).
Recently, randomized smoothing was introduced, which could for the first time, certify a (smoothed) classifier against substantial norm bound noise on ImageNet (Lecuyer2018CertifiedRT; LiSampling; CohenRK19; Salman; Macer), by relaxing exact certificates to high confidence probabilistic ones. Smoothing has the advantage that it scales to large models, however, it can suffer from an added overhead during inference time, and is currently limited to norm-based perturbations.
Transformations, such as translations and rotation, can produce adversarial examples (EngstromMadryRotation; KanbakMF18). PeiCYJ17 were first to certify against such semantic preserving operations on images by enumeration. They reduce the search space by only considering next neighbor interpolation. However, enumeration does not scale to continuous interpolations or fine-grained encodings, such as volume changes of 16-bit audio data. DeepPoly were the first to support certification of rotations for bilinear interpolation, which was significantly improved on by (DeepG). Both methods generate linear relaxations and propagate them through the network. However, these methods do not yet scale to large networks (i.e., ResNet-50) or complex data sets (i.e., ImageNet). We remark that this work is a continuation of our prior work (fischer2020statistical).
We now discuss the necessary background on both randomized smoothing and interpolation.
A smoothed classifier can be constructed out of an ordinary classifier mapping points in to labels in
, by calculating the most probable result ofwhere :
In practice, it is intractable to calculate the probabilities analytically, hence we estimate the integral up to a chosen confidence by sampling. One then obtains the following robustness guarantee:
Suppose , . If
then for all satisfying
In practice we use Algorithm 1 with in order to obtain the above guarantee. We say we “smooth” over a variable or a classifier when we apply Certify.
Interpolation and rounding
Applying a geometric transformation (e.g., rotation) results in a transformed pixel grid which does not align with the original one. Thus, to obtain the pixel values of the transformed image, interpolation is needed. Typical interpolation algorithms for images include nearest-neighbor interpolation, bilinear interpolation and bicubic interpolation (see Appendix A for details).
We denote the rotation by an angle and subsequent interpolation of an image by . The interpolation step consists of resampling (the actual interpolation e.g., bilinear) and rounding the pixel values back to the used underlying data type (e.g., integers in ). It is important to note that rotations with interpolation do not compose, that is, . This is because rotation and interpolation with rounding do not commute (Fig. 2).
Similarly to images, when transforming a 16-bit audio signal, the result can be in floating point space. Thus, it needs to be rounded to be expressible in 16-bit integers again, which introduces rounding errors.
4 Generalization of Smoothing
We now generalize randomized smoothing (CohenRK19) to parameterized transformations. We consider composable transformations , that is, we have that for . We will show in Section 5 how to handle non-composable transformations.
Given a base classifier and a transformation , we define a smoothed classifier by
We now obtain the following robustness guarantee:
Let , be a classifier and be a composable transformation as above. If
then for all satisfying
The proof is similar to the one by CohenRK19 and is given in Appendix B. The key difference to CohenRK19 is that we allow parameterized transformations , while CohenRK19 only allows additive noise.
If we replace with a classifier , behaving with probability the same as and with probability differently than and if
then for all satisfying
By applying the union bound we can relate the output probability of for a class with the output probability of and :
Thus we can obtain new bounds and from and measured on . Plugging these bounds in Theorem 4.2 yields the result. ∎
This lemma allows us to smooth over erroneous classifiers like already smoothed classifiers.
In practice, both statements hold with a certain probability as we have a finite amount of samples to estimate a lower bound of , and an upper bound of , . Algorithm 1 shows the Certify procedure, which can be used to perform this in practice. The LBound method uses Clopper-Pearson bounds to estimate with confidence . The given algorithm returns either both a class and a radius if , or abstains from classification. To perform inference with , it suffices to pick fewer samples and perform a statistical test with confidence whether more samples of class than class got selected.
5 Semantic Perturbations
We now discuss several practical semantic perturbations , first in an idealized setting, that is, without interpolation or rounding, after which we explain how to handle these in the realistic case.
5.1 Idealized Setting
Rotations by an angle compose:
Many other geometric transformations such as translations and scaling also compose.
The volume of an audio signal can be changed by multiplying the signal with a constant. In order to change the signal by (measured in decibel ) we multiply by . Thus the transformation is , which also composes:
5.2 Realistic setting
We illustrate the difficulties introduced by handling interpolation using the example of interpolation with rotation. Our proposed method works with all interpolations. Rounding errors can be handled analogously.
Interpolation and rounding error
Recall that rotation with interpolation does not compose (Fig. 2). Thus smoothing with realistic rotations is not enough as this would not compose with an attacker performing realistic rotations of an angle . To address this, we regard the difference between and as noise:
with for some (computation of is discussed Section 6).
where , and obtain the guarantee that for all (mathematical) rotations satisfying , by using
Because is robust around , we know that adding s.t. does not change the class predicted by . Thus we get
Here, the statement also holds for the specific , namely , resulting in
which can be rewritten using the definition of to obtain the desired safety property
6 Certification of Classifiers
We next explain how to apply the discussed techniques in practice, so to obtain robustness certificates against rotations. Other transformations (e.g., translations) work analogously.
Attacker Model and Calculating
A key to our proposed approach is to find a good bound . For arbitrary images, the norm of the interpolation error can be large, but for realistic images , where is the data distribution, the norm is typically lower. To exploit this, we give probabilistic bounds on the norm for .
In this work we assume an attacker that applies a rotation of angle . We can pick a suitable and compute a probabilistic upper bound by computing
describes the uniform distribution over. Using the Clopper-Pearson interval, we can estimate a lower bound of for a given with confidence , i.e., .
Using this provides a sound certification against an attacker that chooses the angle rotation angle randomly. To defend a against a malicous attacker one needs to bound
which can be achieved by using a similar approach to DeepG (Appendix A.4): The range can be divided into small intervals and subsequently used to calculate an interval over approximation capturing all images obtained through rotations by angles in using interval analysis. From these over approximations we can then directly obtain can upper bound for , where and can be sampled as before.
We note that these bounds can be obtained through precomputation on the data set.
6.1 Preprocessing to reduce
Because the estimates of at a satisfactory error rate are still larger than the radii most certification methods can certify, we add preprocessing to the images before the classifier classifies them.
Commonly, noise is a high-frequency artifact in the data. As interpolation noise behaves similarly, we choose a low-pass filter (blur) as pre-processing, to reduce the norm bound
the classifier needs to handle. The low-pass filter (i) calculates the two dimensional discrete fourier transform, (ii) filters the calculated frequencies by discarding the highest frequencies (thus part of the noise) up to a threshold and (iii) calculates the inverse two dimensional discrete fourier transform on the filtered frequencies.
Another issue impacting is that rotation of a digital image induces black corners to the image. Thus rotating twice can lead to very large differences. To sidestep this issue we introduce a second pre-processing step, namely vignetting, which sets the pixel values of all pixels outside a circle to 0. This significantly reduces the interpolation error as can be seen in Fig. 5.
6.2 Smoothing smoothed classifiers
To do that, we invoke Algorithm 1 twice in order to smooth the smoothed classifier. The Sample procedure of the outer classifier (Fig. 3), which smooths the angle , invokes the Certify procedure of the inner classifier Fig. 4, which smooths the interpolation noise . To clarify the notation, we add the subscript and to the variables of the algorithm.
To obtain the certified radius we need to smooth with samples, and for each of these samples we need to smooth the interpolation noise with samples. While this can amount to a very large number of samples (), we find that relatively few samples in practice suffice to guarantee with high confidence . The constants and can be chosen small compared to and and thus we neglect its cost.
Smoothing over probably certified classifiers
Since both the bound and the underlying classifier are probabilistic in nature, we need to use Lemma 4.3 to consider the cases in which the base classifier can be wrong. This is the case if the guaranteed radius of the classifier is incorrect or the bound is incorrect. Thus,
where is the confidence of the base classifier, is the probabilistic guarantee for and is the confidence with which was obtained.
For some data sets (i.e., CIFAR-10), the bounds for , even after preprocessing (low-pass filter and vignetting) are not small enough to produce satisfactory results. Interpolation noise usually is not concentrated, similarly to noise. While it would at first sight make sense to use certification methods to certify against noise, currently, there is no method that can directly certify CIFAR-10 against noise up to 28/255 to a satisfactory degree for an accurate model. Thus we certify against noise.
To improve this, instead of calculating an bound on the noise, we calculate two bounds: for the noise on the left half of the image, and , for the noise on the right part of the image. First, we split the noise into the noise applied to the left side of the image , and into the noise applied to the right side of the image such that
Next, we estimate the upper bounds for and for . Lastly, we construct a classifier that certifies and then use this as a base classifier to construct , which certifies . Thus is certified for both and . Formally, the outer classifier is
where is the double smoothed classifier (by Lemma 4.3) which smooths the left and right side noise consecutively
where again, Here, denotes the base classifier we smooth over.
Benefit of noise partitioning
If the noise is evenly spread then the norm of the left (right) half of the noise is . Thus, we improve by a factor of . One can also partition the image differently, i.e., color-channels or quadrants and smooth over each part in composition. The number we need to sample in the case for rotations, is to smooth ; for each of these samples we need to smooth using samples and again for each of these samples, we need to smooth using samples. Thus, in total, we need samples.
We now present a through evaluation of our proposed method, showing results for rotations, translations and audio volume change. All experiments were performed on a machine with 2 GeForce RTX 2080 Tis and a 16-core Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz.
7.1 Rotation of Images
We evaluate the robustness on ImageNet (ImageNet), Restricted ImageNet (RImageNet)(TsiprasSETM19), CIFAR-10 (CIFAR) (cifar), the German Traffic Sign Recognition Benchmark (GTSRB) (GTSRB), and MNIST (MNIST). For MNIST, CIFAR and GTSRB we use a ResNet-18 (resnet) and for (R)ImageNet a ResNet-50. To account for interpolation noise, we use an -smoothed classifier trained with SmoothAdvPGD (Salman) and rotations as data augmentation. Details of the models and the training procedure is given in Appendix C.
Attacker Model and Noise Bound
The bound on the interpolation error can be computed once we fix , the range from which the attacker chooses their angle and
, the standard deviation of the Gaussian used to smooth over it, as well as the interpolation algorithm. The computed bounds are given inTable 1. For MNIST, we consider an attacker to perform rotations in and for all other datasets . For datsets with small image resolution (MNIST, CIFAR, GTSRB), we only evaluate bilinear and bicubic interpolation, as nearest neighbor interpolation could be trivially enumerated. While it has also been shown that enumeration can also be applied to ImageNet (PeiCYJ17), we show that our approach can be an efficient alternative for larger angels. For all datasets we employ a low-pass filter and vignetting to further reduce . On all datasets we assume that the image is saved in a loss-less integer format by the attack and account for this in the estimate of . For CIFAR and GTSRB, we used -smoothing and estimate the interpolation error for each color channel.
We observed the interpolation error to increase relative to the size of the image for small images. This is problematic for two reasons, (i) it is harder to certify large noise radii, and (ii) as noted by CohenRK19, smoothing with Gaussian Noise performs worse on smaller images. To address this issue, we assume images from CIFAR to be resized to (from ) prior to rotation (i.e., this is part of the attacker model); for GTSRB, which has images ranging in size from to , we use the same scheme as for CIFAR and for (R)ImageNet (where images range in size from to ) we resize all images prior to rotations such that their shorter side is pixels. This simulates a dataset where each image has at least a resolution of , which we believe to be reasonable given current hardware. Thus our attacker model formally is an attacker that applies rotation of degrees, on an image of at least pixels, that follow the data distribution. For error estimation and classification we resize the images back down to for GTSRB and CIFAR and for (R)ImageNet such that the shorter side is pixel long and take a crop of the center, the common preprocessing for this datset (AlexNet). Values in Table 1 take prepossessing into account. Fig. 6 shows the interpolation errors on ImageNet in blue for all images in red for images where the shorter side is naturally longer than . This indicates that we don’t measure an effect the resclaing but rather the image size.
Entries of Table 1 marked with denote the maximal error measured for the dataset and the perturbation. However, they lie just outside what we could prove with reasonable effort while retaining accuracy. Thus in practice we use a slightly smaller , that still covers of possible interpolation errors. This approach soundly certifies against attacks from an attacker that randomly chooses an angle, as we observed that errors beyond the chosen are rare and empirically where not concentrated on particular images. Section 6 discusses the estimation of for a sound classifier. First estimate for these values show that they are similar to the values reported in Table 1.
We evaluated our algorithm on the non-starred values in Table 1. The results are shown in Table 2: Acc. and Acc. denote how many images have been successfully classified respectively. All images classified by are certified with a radius . For ImageNet and RImageNet as well as MNIST we observe very good results, as these datasets either have very large images or are very simple and can sometimes prove radii that are larger than what we used as possible ranges for the attacker when estimating . We have clipped these values back to what we assumed in the attacker model and indicate this by . For GTSRB and CIFAR, the images are yielding higher interpolation errors but at the same time are less robustness to noise. Thus it becomes harder to construct a smoothed classifier for these datasets. This required us to drop the confidence from to and apply smoothing over each color channel separately. DeepG certify 87.8% on MNIST, (, 35s per image) and again 87.8% on CIFAR-10, (, 117s per image), with out taking rounding into account. PeiCYJ17 need on degrees 714s per image and report the failure rate per image, hence not comparable. We can certify 81% on MNIST, (often with , 10s per image) and 69% on RImageNet, (often with , 231s per image). Parameter choices are discussed in Appendix C.
Similarly to rotations, we apply preprocessing calculate estimates for in Table 4. As ImageNet images get cropped before classification and MNIST images have black background, we do not use vignetting. Since the interpolation errors for translation are slightly higher than for rotation, we only consider MNIST and (R)ImageNet. On the other datasets the error becomes infeasible large. Also we do not consider nearest neighbor interpolation as it can be easily enumerated here. Results are shown in Table 3. The radius is given in percent of image size. on MNIST corresponds to pixel and on 2000 pixels to 48. Since we obtain a -bound, the radius reads as where and are changes in respective directions. DeepG certify 77% on MNIST, ( pixels, 263s per image). We significantly improve this result.
7.3 Volume Change of Audio Signals
To evaluate our method on audio data we use the speech commands dataset (gcommands), which are recordings of up to 1 second of people saying one of 30 different words, which are to be classified. Our audio experiments are similar to those for images, although here we do not employ any resizing or scaling. We use a classification pipeline that converts audio wave forms into MFCC spectra (MFCC) and then treats these as images and applies normal image classification. We use a ResNet50, that was trained with Gaussian noise, but not SmoothAdvPGD. We apply the noise before the wavefrom is converted to the MFCC spectrum.
We estimate with to be and . On 100 samples, the base classifier was correct times, and the smoothed classifier 51 times for of 0.75, 1.96 and 3.12 for the , , percentile respectively, corresponding to , and dB. At and the average certification time was .
Since the success of our approach largely depends on the size of the underlying images we are in an unusual situation where it is easier for us to prove statements on ImageNet than CIFAR. Accuracy of our robust classifier is mostly limited by the large estimate of and the quality of the underlying certification. Thus any advances in robustness or estimating lower (sound) E (e.g., by computing it per image) directly translate into improvements of our method. The choice of trades accuracy with certification radius as seen in Table 2. Finally, using more samples and further splits for -smoothing, the error bound can be pushed further, at the cost of compute time and accuracy.
In this work we presented a novel method that extends the Gaussian Smoothing framework of CohenRK19 to semantic parameterized perturbations (beyond -balls) in domains such as image (including ImageNet) and audio classification. The framework is general and can be directly applied as-is to standard semantic perturbations with all interpolation schemes while being sound for different types of pixel values. We believe the generality of the method will trigger further work in this direction.
Appendix A Interpolations
In this section, we discuss three common interpolation methods used to compute a pixel value on a unit square lying between 4 pixels of an image. We describe the image around the unit square by a function . We denote the interpolation on the unit square by .
Nearest neighbour interpolation
Nearest neighbor interpolation assigns a point in the unit square the value of the nearest corner point of the unit square, that is,
Bilinear interpolation is described by a multivariate polynomial such for all , that is
We know the values for . Further we define
for . We can now fit the multivariate polynomial such that , , and for all .
Appendix B Proof of Theorem 4.2
We present and proof a slightly more general version of Theorem 4.2.
Let , be a classifier, a composable transformation for with a symmetric, positive-definite covariance matrix . If
then for all satisfying