1 Introduction
In recent years, adversarial attacks against machine learning systems, and defenses against these attacks, have been heavily studied
(Szegedy et al., 2013; Madry et al., 2017; Carlini and Wagner, 2017). Although these attacks have been applied in a variety of domains, image classification tasks remain a major focus of research. In general, for a specified image classifier
, the goal of an adversarial attack on an image is to produce a perturbed image that is imperceptibly ‘close’ to , such that classifies differently than . This ‘closeness’ notion can be measured in a variety of different ways under different threat models. Most existing attacks and defenses consider additive threat models where the norm of is bounded.Recently, nonadditive threat models (Wong et al., 2019; Laidlaw and Feizi, 2019; Engstrom et al., 2019; Assion et al., 2019) have been introduced which aim to minimize the distance between and according to other metrics. Among these attacks is the attack introduced by Wong et al. (2019) which considers the Wasserstein distance between and
, normalized such that the pixel intensities of the image can be treated as probability distributions. Informally, the Wasserstein distance between probability distributions
and measures the minimum cost to ‘transport’ probability mass in order to transform into , where the cost scales with both the amount of mass transported and the distance over which it is transported with respect to some underlying metric. The intuition behind this threat model is that shifting pixel intensity a short distance across an image is less perceptible than moving the same amount of pixel intensity a larger distance (See Figure 1 for an example of a Wasserstein adversarial attack.)A variety of practical approaches have been proposed to make classifiers robust against adversarial attack including adversarial training (Madry et al., 2017)
(Papernot et al., 2016), and obfuscated gradients (Papernot et al., 2017). However, as new defenses are proposed, new attack methodologies are often rapidly developed which defeat these defences (Tramèr et al., 2017; Athalye et al., 2018; Carlini and Wagner, 2016). While updated defences are often then proposed (Tramèr et al., 2017), in general, we cannot be confident that new attacks will not in turn defeat these defences.To escape this cycle, approaches have been proposed to develop certifiably robust classifiers (Wong and Kolter, 2018; Gowal et al., 2018; Lecuyer et al., 2019; Li et al., 2018; Cohen et al., 2019; Salman et al., 2019): in these classifiers, for each image , one can calculate a radius such that it is provably guaranteed that any other image with distance less than from will be classified similarly to . This means that no adversarial attack can ever be developed which produces adversarial examples to the classifier within the certified radius.
One effective approach to develop certifiably robust classification is to use randomized smoothing with a probabilistic robustness certificate (Lecuyer et al., 2019; Li et al., 2018; Cohen et al., 2019; Salman et al., 2019). In this approach, one uses a smoothed classifier , which represents the expectation of over random perturbations of . Based on this smoothing, one can derive an upper bound on how steeply the scores assigned to each class by can change, which can then be used to derive a radius in which the highest class score must remain highest^{1}^{1}1
In practice, samples are used to estimate the expectation
, producing an empirical smoothed classifier : the certification is therefore probabilistic, with a degree of certainty dependent on the number of samples..In this work, we present the first certified defence against Wasserstein adversarial attacks using an adapted randomized smoothing approach, which we call Wasserstein smoothing. To develop the robustness certificate, we define a (nonunique) representation of the difference between two images, based on the flow of pixel intensity necessary to construct one image from another. In this representation, we show that the norm of the minimal flow between two images is equal to the Wasserstein distance between the images. This allows us to apply existing smoothingbased defences, by adding noise in the space of these representations of flows. We show that empirically that this gives improved robustness certificates, compared to using a weak upper bound of Wasserstein distance given by randomized smoothing in the feature space of images directly. We also show that our Wasserstein smoothing defence protects against Wasserstein adversarial attacks empirically, with significantly improved robustness compared to baseline models. For small adversarial perturbations on the MNIST dataset, our method achieves higher accuracy under adversarial attack than all existing practical defences for the Wasserstein threat model.
In summary, we make the following contributions:

We develop a novel certified defence for the Wasserstein adversarial attack threat model. This is the first certified defence, to our knowledge, that has been proposed for this threat model.

We demonstrate that our certificate is nonvacuous, in that it can certify Wasserstein radii larger than those which can be certified by exploiting a trivial upper bound on Wasserstein distance.

We demonstrate that our defence effectively protects against existing Wasserstein adversarial attacks, compared to an unprotected baseline.
2 Background
Let denote a two dimensional image, of height and width . We will normalize the image such that , so that can be interpreted as a probability distribution on the discrete support of pixel coordinates of the twodimensional image.^{2}^{2}2In the case of multichannel color images, the attack proposed by Wong et al. (2019) does not transport pixel intensity between channels. This allows us to defend against these attacks using our 2D Wasserstein smoothing with little modification. See Section 6.3, and Corollary 2 in the appendix Following the notation of Wong et al. (2019), we define the pWasserstein distance between and as:
Definition 2.1.
Given two distributions , and a distance metric , the pWasserstein distance as:
(1)  
is the cost of transporting a mass unit from the position to in the image.
Note that, for the purpose of matrix multiplication, we are treating
as vectors of length
. Similarly, the transport plan matrix and the cost matrix are in .Intuitively, represents the amount of probability mass to be transported from pixel to , while represents the cost per unit probability mass to transport this probability. We can choose to be any measure of distance between pixel positions in an image. For example, in order to represent the distance metric between pixel positions, we can choose:
(2) 
Moreover, to represent the distance metric between pixel positions, we can choose:
(3) 
Our defence directly applies to the 1Wasserstein metric using the distance as the metric , while the attack developed by Wong et al. (2019) uses the distance. However, because images are two dimensional, these differ by at most a constant factor of , so we adapt our certificates to the setting of Wong et al. (2019) by simply scaling our certificates by . All experimental results will be presented with this scaling. We emphasize that this it not the distinction between 1Wasserstein and 2Wasserstein distances: this paper uses the 1Wasserstein metric, to match the majority of the experimental results of Wong et al. (2019).
To develop our certificate, we rely an alternative linear program formulation for the 1Wasserstein distance on a twodimensional image with the
distance metric, provided by Ling and Okada (2007):(4) 
where and ,
Here, denotes the (up to) four immediate (nondiagonal) neighbors of the position ; in other words, . For the distance in two dimensions, Ling and Okada (2007) prove that this formulation is in fact equivalent to the linear program given in Equation 1. Note that only elements of with need to be defined: this means that the number of variables in the linear program is approximately , compared to the elements of in Equation 1. While this was originally used to make the linear program more tractable to be solved directly, we exploit the form of this linear program to devise a randomized smoothing scheme in the next section.
3 Robustness Certificate
In order to present our robustness certificate, we first introduce some notation. Let denote a local flow plan. It specifies a net flow between adjacent pixels in an image , which, when applied, transforms to a new image . See Figure 2 for an explanation of the indexing. For compactness, we write where , and in general refer to the space of possible local flow plans as the flow domain. We define the function , which applies a local flow to a distribution.
Definition 3.1.
The local flow plan application function is defined as:
(5) 
where we let .^{3}^{3}3Note that the new image is not necessarily a probability distribution because it may have negative components. However, note that normalization is preserved: . This is because every component of is added once and subtracted once to elements in .
Note that local flow plans are additive:
(6) 
Using this notation, we make a simple transformation of the linear program given in Equation 4, removing the positivity constraint from the variables and reducing the number of variables to :
Lemma 1.
For any normalized probability distributions :
(7) 
where denotes the 1Wasserstein metric, using the distance as the underlying distance metric .
In other words, we can upperbound the Wasserstein distance between two images using the norm of any feasible local flow plan between the two images. This enables us to extend existing results for smoothingbased certificates (Lecuyer et al., 2019) to the Wasserstein metric, by adding noise in the flow domain.
Definition 3.2.
We denote by as the Laplace noise with parameter in the flow domain of dimension .
Given a classification score function , we define as the Wassersteinsmoothed classification function as follows:
(8) 
Let be the class assignment of using the Wassersteinsmoothed classifier (i.e. ).
Theorem 1.
For any normalized probability distribution , if
(9) 
then for any perturbed probability distribution such that , we have:
(10) 
All proofs are presented in the appendix.
4 Intuition: OneDimensional Case
To provide an intuition about the proposed Wasserstein smoothing certified robustness scheme, we consider a simplified model, in which the support of is a onedimensional array of length , rather than a twodimensional grid (i.e. ). In this case, we can denote a local flow plan , so that for :
(11) 
where . In this onedimensional case, for any fixed (with the normalization constraint that ), there is a unique solution to :
(12) 
Note at this reminds us a wellknown identity describing optimal transport between two distributions which share a continuous, onedimensional support (see section 2.6 of Peyré et al. (2019), for example):
(13) 
where denote cumulative density functions. If we apply this result to our discretized case, with the index taking the place of , and apply the identity to and , this becomes:
(14) 
By the uniqueness of the solution given in Equation 12, for any , we can define as the solution to , where is an arbitrary fixed reference distribution (e.g. suppose for ). Therefore, instead of operating on the images directly, we can equivalently operate on and in the flow domain instead. We will therefore define a flowdomain version of our classifier :
(15) 
We will now perform classification entirely in the flowdomain, by first calculating and then using as our classifier. Now, consider and an adversarial perturbation , and let be the unique solution to . By equation 14, . Then:
(16) 
where the second equality is by equation 6. Moreover, by the uniqueness of Equation 12, , or . Therefore
(17) 
In other words, if we classify in the flowdomain, using , the distance between point is the Wasserstein distance between the distributions and . Then, we can perform smoothing in the flowdomain, and use the existing robustness certificate provided by Lecuyer et al. (2019), to certify robustness. Extending this argument to twodimensional images adds some complication: images can no longer be represented uniquely in the flow domain, and the relationship between distance and the Wasserstein distance is now an upper bound. Nevertheless, the same conclusion still holds for 2D images as we state in Theorem 1. Proofs for the twodimensional case are given in the appendix.
Noise  Wasserstein Smoothing  Wasserstein Smoothing  Wasserstein Smoothing 
standard deviation  Classification accuracy  Median certified  Base Classifier 
(Percent abstained)  robustness  Accuracy  
0.005  98.71(00.04)  0.0101  97.94 
0.01  97.98(00.19)  0.0132  94.95 
0.02  93.99(00.58)  0.0095  79.72 
0.05  74.22(03.95)  0  43.67 
0.1  49.41(01.29)  0  30.26 
0.2  31.80(08.40)  N/A  25.13 
0.5  22.58(00.84)  N/A  22.67 
Noise  Laplace Smoothing  Laplace Smoothing  Laplace Smoothing 
standard deviation  Classification accuracy  Median certified  Base Classifier 
(Percent abstained)  robustness  Accuracy  
0.005  98.87(00.06)  0.0062  97.47 
0.01  97.44(00.19)  0.0053  89.32 
0.02  91.11(01.29)  0.0030  67.08 
0.05  61.44(07.45)  0  33.80 
0.1  34.92(09.36)  N/A  25.56 
0.2  24.02(05.67)  N/A  22.85 
0.5  22.57(01.05)  N/A  22.70 
5 Practical Certification Scheme
To generate probabilistic robustness certificates from randomly sampled evaluations of the base classifier , we adapt the procedure outlined by Cohen et al. (2019) for certificates. We consider a hard smoothed classifier approach: we set if the base classifier selects class at point , and otherwise. We also use a stricter form of the condition given as Equation 9:
(18) 
This means that we only need to provide a probabilistic lower bound of the expectation of the largest class score, rather than bounding every class score. This reduces the number of samples necessary to estimate a highconfidence lower bound on , and therefore to estimate the certificate with high confidence. Cohen et al. (2019) provides a statistically sound procedure for this, which we use: refer to that paper for details. Note that, when simply evaluating the classification given by , we will also need to approximate using random samples. Cohen et al. (2019) also provides a method to do this which yields the expected classification with high confidence, but may abstain from classifying. We will also use this method when evaluating accuracies.
Since the Wasserstein adversarial attack introduced by Wong et al. (2019) uses the distance metric, to have a fair performance evaluation against this attack, we are interested in certifying a radius in the 1Wasserstein distance with underlying distance metric, rather than . Let us denote this radius as . In twodimensional images, the elements of the cost matrix in this metric may be smaller by up to a factor of , so we have:
(19) 
Therefore, by certifying to a radius of , we can effectively certify against the metric 1Wasserstein attacks of radius ; our condition then becomes:
(20) 
6 Experimental Results
In all experiments, we use 10,000 random noised samples to predict the smoothed classification of each image; to generate certificates, we first use 1000 samples to infer which class has highest smoothed score, and then 10,000 samples to lowerbound this score. All probabilistic certificates and classifications are reported to confidence. The model architectures used for the base classifiers for each data set are the same as used in Wong et al. (2019). When reporting results, median certified accuracy refers to the maximum radius such that at least of classifications for images in the data set are certified to be robust to at least this radius, and these certificates are for the correct ground truth class. If over of images are not certified for the correct class, this statistic is reported as .
6.1 Comparison to naive Laplace Smoothing
Note that one can derive a trivial but sometimes tight bound, that, under any distance metric, if , then . (See Corollary 1 in the appendix.) This enables us to write a condition for radius Wasserstein certified robustness by applying Laplace smoothing directly, and simply converting the certificate. In our notation, this condition is:
(21) 
where is a smoothed classifier with Laplace noise added to every pixel independently. It may appear as if our Wassersteinsmoothed bound should only be an improvement over this bound by a factor of in the certified radius . However, as shown in Table 1, we in fact improve our certificates by a larger factor. This is because, for a fixed noise standard deviation, the base classifier is able to achieve a higher accuracy after adding noise in the flowdomain, compared to adding noise directly to the pixels. When adding noise in the flowdomain, we add and subtract noise in equal amounts between adjacent pixels, preserving more information for the base classifier.
To give a concrete example, consider some square patch of an image. Suppose that the overall aggregate pixel intensity in this patch (i.e. the sum of the pixel values) is a salient feature for classification (This is a highly plausible situation: for example, in MNIST, this may indicate whether or not some region of an image is occupied by part of a digit.) Let us call this feature , and calculate the variance of in smoothing samples under Laplace and Wasserstein smoothing, both with variance . Under Laplace smoothing (Figure 4a), independent instances of Laplace noise are added to , so the resulting variance will be : this is proportional to the area of the region. In the case of Wasserstein smoothing, by contrast, probability mass exchanged between between pixels in the interior of the patch has no effect on the aggregate quantity . Instead, only noise on the perimeter will affect the total feature value : the variance is therefore (Figure 4b). Wasserstein smoothing then reduces the effective noise variance on the feature by a factor of .
6.2 Empirical adversarial accuracy
We measure the performance of our smoothed classifier against the Wassersteinmetric adversarial attack proposed in Wong et al. (2019), and compare to models tested in that work. Results are presented in Figure 5. For testing, we use the same attack parameters as in Wong et al. (2019): the ’Standard” and ’Adversarial Training’ results are therefore replications of the experiments from that paper, using the publicly available code and pretrained models.
In order to attack our hard smoothed classifier, we adapt the method proposed by Salman et al. (2019): in particular, note that we cannot directly calculate the gradient of the classification loss with respect to the image for a hard
smoothed classifier, because the derivatives of the logits of the base classifier are not propagated. Therefore, we must instead attack a
soft smooth classifier: we take the expectation over samples of the softmaxed logits of the base classifier, instead of the final classification output. In each step of the attack, we use 128 noised samples to estimate this gradient, as used in Salman et al. (2019).In the attack proposed by Wong et al. (2019), the images are attacked over 200 iterations of projected gradient descent, projected onto a Wasserstein ball, with the radius of the ball every 10 iterations. The attack succeeds, and the final radius is recorded, once the classifier misclassifies the image. In order to preserve as much of the structure (and code) of the attack as possible to provide a fair comparison, it is thus necessary for us to evaluate each image using our hard classifier, with the full 10,000 smoothing samples, at each iteration of the attack. We count the classifier abstaining as a misclassification for these experiments. However, note that this may somewhat underestimate the true robustness of our classifier: recall that our classifier is nondeterministic; therefore, because we are repeatedly evaluating the classifier and reporting a perturbed image as adversarial the first time it is missclassified, we may tend to overcount misclassifications. However, because we are using a large number of noise samples to generate our classifications, this is only likely to happen with examples which are close to being adversarial. Still, the presented data should be regarded as a lower bound on the true accuracy under attack of our Wasserstein smoothed classifier.
Noise standard deviation  Classification accuracy  Median certified  Base Classifier 

(Percent abstained)  robustness  Accuracy  
0.00005  87.01(00.24)  0.000101  86.02 
0.0001  83.39(00.42)  0.000179  82.08 
0.0002  77.57(00.66)  0.000223  75.46 
0.0005  68.75(01.01)  0.000209  65.12 
0.001  61.65(01.77)  0.000127  57.03 
In Figure 5
, we note two things: first, our Wasserstein smoothing technique appears to be an effective empirical defence against Wasserstein adversarial attacks, compared to an unprotected (’Standard’) network. (It is also more robust than the binarized and
robust models tested by Wong et al. (2019): see appendix.) However, for large perturbations, our defence is less effective than the adversarial training defence proposed by Wong et al. (2019). This suggests a promising direction for future work: Salman et al. (2019) proposed an adversarial training method for smoothed classifiers, which could be applied in this case. Note however that both Wasserstein adversarial attacks and smoothed adversarial training are computationally expensive, so this may require significant computational resources.Second, the median radius of attack to which our smoothed classifier is empirically robust is larger than the median certified robustness of our smoothed classifier by two orders of magnitude. This calls for future work both to develop improved robustness certificates as well as to develop more effective attacks in the Wasserstein metric.
6.3 Experiments on color images (CIFAR10)
Wong et al. (2019) also apply their attack to color images in CIFAR10. In this case, the attack does not transport probability mass between color channels: therefore, in our defence, it is sufficient to add noise in the flow domain to each channel independently to certify robustness (See Corollary 2 in the appendix for a proof of the validity of this method). Certificates are presented in Table 2, while empirical robustness is as Figure 6. Again, we compare directly to models from Wong et al. (2019). We note that again, empirically, our model significantly outperforms an unprotected model, but is not as robust as a model trained adversarially. We also note that the certified robustness is orders of magnitude smaller than computed for MNIST: however, the unprotected model is also significantly less robust empirically than the equivalent MNIST model.
7 Conclusion
In this paper, we developed a smoothingbased certifiably robust defence for Wassersteinmetric adversarial examples. To do this, we add noise in the space of possible flows of pixel intensity between images. To our knowledge, this is the first certified defence method specifically tailored to the Wasserstein threat model. Our method proves to be an effective practical defence against Wasserstein adversarial attacks, with significantly improved empirical adversarial robustness compared to a baseline model.
References

The attack generator: a systematic approach towards constructing adversarial attacks.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
, pp. 0–0. Cited by: §1.  Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In International Conference on Machine Learning, pp. 274–283. Cited by: §1.
 Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311. Cited by: §1.

Towards evaluating the robustness of neural networks
. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §1.  Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918. Cited by: §1, §1, §5.
 Exploring the landscape of spatial robustness. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 1802–1811. Cited by: §1.
 On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715. Cited by: §1.
 Functional adversarial attacks. arXiv preprint arXiv:1906.00001. Cited by: §1.
 Certified robustness to adversarial examples with differential privacy. In 2019 2019 IEEE Symposium on Security and Privacy (SP), Vol. , Los Alamitos, CA, USA, pp. 726–742. External Links: ISSN 23751207, Document, Link Cited by: Appendix A, Appendix C, §1, §1, §3, §4.
 Secondorder adversarial attack and certifiable robustness. arXiv preprint arXiv:1809.03113. Cited by: §1, §1.
 An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE transactions on pattern analysis and machine intelligence 29 (5), pp. 840–853. Cited by: Appendix A, §2.

Towards deep learning models resistant to adversarial attacks
. arXiv preprint arXiv:1706.06083. Cited by: §1, §1.  Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506–519. Cited by: §1.
 Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: §1.
 Computational optimal transport. Foundations and Trends® in Machine Learning 11 (56), pp. 355–607. Cited by: §4.
 Provably robust deep learning via adversarially trained smoothed classifiers. arXiv preprint arXiv:1906.04584. Cited by: §1, §1, §6.2, §6.2.
 Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1.
 Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §1.
 Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pp. 5283–5292. Cited by: §1.
 Wasserstein adversarial examples via projected Sinkhorn iterations. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 6808–6817. External Links: Link Cited by: Appendix B, Figure 7, Appendix B, Appendix C, Document, Figure 1, §1, §2, §2, §5, Figure 5, Figure 6, §6.2, §6.2, §6.2, §6.3, §6, footnote 2.
References

The attack generator: a systematic approach towards constructing adversarial attacks.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
, pp. 0–0. Cited by: §1.  Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In International Conference on Machine Learning, pp. 274–283. Cited by: §1.
 Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311. Cited by: §1.

Towards evaluating the robustness of neural networks
. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §1.  Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918. Cited by: §1, §1, §5.
 Exploring the landscape of spatial robustness. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 1802–1811. Cited by: §1.
 On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715. Cited by: §1.
 Functional adversarial attacks. arXiv preprint arXiv:1906.00001. Cited by: §1.
 Certified robustness to adversarial examples with differential privacy. In 2019 2019 IEEE Symposium on Security and Privacy (SP), Vol. , Los Alamitos, CA, USA, pp. 726–742. External Links: ISSN 23751207, Document, Link Cited by: Appendix A, Appendix C, §1, §1, §3, §4.
 Secondorder adversarial attack and certifiable robustness. arXiv preprint arXiv:1809.03113. Cited by: §1, §1.
 An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE transactions on pattern analysis and machine intelligence 29 (5), pp. 840–853. Cited by: Appendix A, §2.

Towards deep learning models resistant to adversarial attacks
. arXiv preprint arXiv:1706.06083. Cited by: §1, §1.  Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506–519. Cited by: §1.
 Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: §1.
 Computational optimal transport. Foundations and Trends® in Machine Learning 11 (56), pp. 355–607. Cited by: §4.
 Provably robust deep learning via adversarially trained smoothed classifiers. arXiv preprint arXiv:1906.04584. Cited by: §1, §1, §6.2, §6.2.
 Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1.
 Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §1.
 Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pp. 5283–5292. Cited by: §1.
 Wasserstein adversarial examples via projected Sinkhorn iterations. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 6808–6817. External Links: Link Cited by: Appendix B, Figure 7, Appendix B, Appendix C, Document, Figure 1, §1, §2, §2, §5, Figure 5, Figure 6, §6.2, §6.2, §6.2, §6.3, §6, footnote 2.
Appendix A Proofs
Lemma 1.
For any normalized probability distributions , there exists at least one such that . Furthermore:
(22) 
Where denotes the 1Wasserstein metric, using distance as the underlying distance metric.
Proof.
We first show the equivalence of the above minimization problem with the linear program proposed by Ling and Okada (2007), restated here:
(23) 
where and ,
It suffices to show that (1) there is a transformation from the variables in Equation 23 to the variables in Equation 22, such that all points which are feasible in Equation 23 are feasible in 22 and the minimization objective in Equation 22 is less than or equal to the minimization objective in Equation 23, and (2) there is a transformation from the variables in Equation 22 to the variables in Equation 23, such that all points which are feasible in Equation 22 are feasible in Equation 23 and the minimization objective in Equation 23 is less than or equal to the minimization objective in Equation 22.
We start with (1). We give the transformation as:
(24) 
Where we let . To show feasibility, we write out fully the flow constraint of Equation 23:
(25) 
Substituting in Equation 24:
(26) 
But by Definition 3.1, this is exactly:
(27) 
Which is the sole constraint in Equation 22: then any solution which is feasible in Equation 23 is feasible in Equation 22. Also note that:
(28) 
Where the inequality follows from triangle inequality applied to Equation 24, and in the second sum in the fourth line, we exploit the fact that to shift indices. This shows that the minimization objective in Equation 22 is less than or equal to the minimization objective in Equation 23.
Moving on to (2), we give the transformation as:
(29) 
Note that the nonnegativity constraint of Equation 23 is automatically satisfied by the form of these definitions. Shifting indices, we also have:
(30) 
From the constraint on Equation 22, we have:
(31) 
Which is exactly the second constraint of Equation 23: then any solution which is feasible in Equation 23 is feasible in Equation 22. Also note that:
(32) 
Where we again exploit the fact that to shift indices, in the fourth line. This shows that the minimization objective in Equation 23 is less than or equal to the minimization objective in Equation 22, completing (2).
Finally, now that we have shown that Equations 22 and 23 are in fact equivalent minimizations (i.e., we have proven Equation 22 correct), we would like to show that there is always a feasible solution to 22, as claimed. By the above transformations, it suffices to show that there is always a feasible solution to Equation 23. Ling and Okada (2007) show that any feasible solution the the general Wasserstein minimization LP (Definition 1) can be transformed into a solution to Equation 23, so it suffices to show that the LP in Definition 1 always has a feasible solution. This is trivially satisfied by taking , where we note that , a probability distribution, is nonnegative.
∎
Theorem 1.
Consider a normalized probability distribution , and a classification score function . Let refer to the Wassersteinsmoothed classification function:
(33) 
Let be the class assignment of using the smoothed classifier (i.e. ). If
(34) 
Then for any perturbed probability distribution such that :
(35) 
Proof.
Let be the uniform probability vector. As a consequence of Lemma 1, for any distribution , there exists a nonempty set of local flow plans :
(36) 
Also, we may define a version of the classifier on the local flow plan domain:
(37) 
Let be an arbitrary element in , and consider any perturbed such that . By Theorem 1:
(38) 
Then, using Equation 6:
(39) 
Let the minimum be achieved at . Making a change of variables (), we have:
(40) 
Note that for any (for ) :
(41) 
We can now apply Proposition 1 from Lecuyer et al. (2019), restated here:
Proposition.
Consider a vector , and a classification score function . Let , and let be the class assignment of using a Laplacesmoothed version of the classifier :
(42) 
If:
(43) 
Then for any perturbed probability distribution such that :
(44) 
We apply this proposition to , noting that :
(45) 
Then, using Equation 41:
(46) 
Which was to be proven. ∎
Corollary 1.
For any normalized probability distributions , if , then , where is the 1Wasserstein metric using any norm as the underlying distance metric. Furthermore, there exist distributions where these inequalities are tight.
Proof.
Let indicate the optimal transport plan between and . From Definition 1, we have and . Then:
(47) 
Let represent a modified version of , with the diagonal elements set to zero. Note that and . Then, using triangle inequality:
Comments
There are no comments yet.