which aim to develop a robust Deep Neural Network against adversarial attacks. Among them, the adversarial training methods (e.g, FGSM, PGD adversarial training[13, 22] and TRADES  that utilize adversarial examples as training data, have been one of the most effective approaches, which truly boost the model robustness without the facing the problem of obfuscated gradients . In adversarial training, recent works [34, 4]
show that reducing the divergence of the representations of images and their adversarial examples in latent space (e.g., the feature space output from an intermediate layer of a classifier) can significantly improve the robustness. For example, in, latent representations of images in the same class are pulled closer together than those in different classes, which led to a more compact latent space and consequently, better robustness.
In this paper, we show that the principle of robustifying classifiers by enhancing compactness in the latent space has a strong connection with contrastive learning (CL), an recent but increasingly popular and effective self-supervised representation learning approach [8, 15, 18, 14]. Specifically, CL learns representations of unlabeled data by choosing an anchor and pulling the anchor and its positive samples in latent space while pushing it away from many negative samples. Given the anchor, a clear important factor to the success of CL is the approach to choose the positive and negative sets. This is even more important if CL is to be applied under an supervised setting to robustify models against adversarial attacks. This inspires us to investigate the role of adversarial samples to be used in CL for improving model robustness. Our observations and experiments demonstrate that directly adopting CL into AML can hardly improve adversarial robustness, indicating that a deeper understanding of the relationships between the CL mechanism, latent space compactness, and adversarial robustness is required. Pursuing this comprehension, we give a detailed study on the above aspects and subsequently, propose a new framework for enhancing robustness using the principles of CL. Our paper provides answers for three research questions:
(Q1) Why can CL help to improve the adversarial robustness? By comprehensively investigating the behavior of divergence in latent space with different kinds of augmentations, our exercise shows that: robustness of a model can be interpreted by the ratio between two divergences in the latent space: the intra-class divergence measured on benign images and their adversarial examples of the same class and the inter-class divergence measured on those samples of different classes, and the lower the ratio is, the more robustness can be achieved. These observations motivate the idea that a robust model can be achieved by simultaneously contrasting the intra-class divergence between images and their adversarial examples with the inter-class divergence.
(Q2) How to integrate CL with adversarial training in the context of AML?
CL originally works with the case where data labels are unavailable, which does not fit the AML context, where we are more interested in robustifying classifiers for supervised learning. The recent research of Supervised Contrastive Learning (SCL) extends CL by leveraging label information, where latent representations from the same class are pulled closer together than those from different classes. While it might seem that SCL could be applied for AML problem, we show in this paper that it is highly nontrivial to do so. To this end, we propose Adversarial Supervised Contrastive Learning (ASCL) which effectively and efficiently utilizes adversarial samples to improve model robustness. First, for an anchor image, we use its adversarial images as the transformed/augmented samples, which is different from the standard data augmentation techniques used in conventional CL methods [8, 18]. Second, due to the high cost of generating adversarial examples, we use only a single transformation instead of multiple samples as in CL. Third, we integrate SCL with adversarial training  in addition to the clustering assumption , to force the compactness in latent space and subsequently improve the adversarial robustness.
(Q3) What are the important factors for the application of the ASCL framework in the context of AML? One of the key steps of CL/SCL is the selecting of positive and negative samples for an anchor image. Although different approaches have been proposed, most of them focus on natural images which have little effect for AML. Specifically, in a data batch, CL and SCL consider the samples that are not from the same instance or not in the same class of the anchor image as its negative samples, respectively, without taking into account the correlation between a sample and the anchor image. This can lead to too many true negative but useless samples which are highly uncorrelated with the anchor in the latent space as illustrated in Figure 1. Pushing the uncorrelated samples away from the anchor can be ineffective and make the training unstable. This issue aggravates with more diverse data and in the AML context, making the original CL/SCL approaches inapplicable in AML. Based on the comprehensive study on the intrinsic properties of adversarial images and their relationships to the benign counterparts, we develop a novel series of strategies for selecting positive and negative samples in our ASCL framework, which precisely picks the most relevant samples of the anchor that help to improve adversarial robustness most and to significantly improve training stability. By providing the answers to the above research questions, we summarize our contributions in this paper as follows:
We provide a comprehensive and insightful understanding of adversarial robustness regarding the divergences in latent space, which sheds light on adapting the contrastive learning principle to enhance robustness.
We propose a novel Adversarial Supervised Contrastive Learning (ASCL) framework, where the well-established contrastive learning mechanism is leveraged to make the latent space of a classifier more compact, leading to a more robust model against adversarial attacks. Recent works [19, 17] integrated Self-Supervised Contrastive Learning (SSCL) to learn unsupervised robust representations for improving robustness in unsupervised/semi-supervised setting. Further discussion can be found in our supplementary material. To our knowledge, ours is the first work on integrating SCL with adversarial training to improve adversarial robustness in fully-supervised setting.
By analyzing the intrinsic characteristics of AML, we develop effective strategies for selecting positive and negative samples more precisely, which are critical to make the contrastive learning principle work in AML.
As shown in extensive experiments, our proposed framework is able to significantly improve a classifier’s robustness, outperforming several state-of-the-art adversarial training defense methods (e.g. ADV  and TRADES ) against strong attacks (e.g. Multi-targeted PGD  and Auto-Attack ) on the benchmark datasets.
2 Anchor Divergence on the Latent Space
Examining the question “Why can CL help to improve the adversarial robustness?”, we design experiments to show the connection of the natural accuracy and robust accuracy to the latent divergence of an anchors and its contrastive samples.
Let be a batch of benign images with labels . Consider two types of transformations: : an adversarial transformation from an adversary (e.g., PGD attack) and : a standard augmentation (e.g., combination of random cropping and random jittering). Applying a transformation to benign image , we obtain where if and if .
Given a transformation , we consider two kinds of samples w.r.t an anchor : the positive set including benign examples and their counterparts in the same class with the anchor and the negative set including benign examples and their counterparts in different classes with the anchor. Note that we omit the superscript in two aforementioned sets for notion simplification.
From now then, the positive and negative sets are understood in the specific context w.r.t a fixed transformation which could be either or . We are interested in the representations of begin and transformed images at a specific intermediate layer of the neural net classifier . Let us further denote those representations by for benign images and for transformed images according to transformation .
We desire to speculate some types of divergences between benign images and transformed images via transformation at some intermediate layers of .
(i) Absolute intra-class divergence: (i.e., evaluated based on the positive sets); and absolute inter-class divergence: (i.e., evaluated based on the negative sets). Here we note that is cosine distance between two representations, and represents the cardinality of a set.
(ii) Relative intra-class divergence: ; hence relative divergence generally represents how large the magnitude of intra-class divergence is relatively with the inter-class divergence.
We conduct empirical study on CIFAR-10 dataset to figure out the relationship between relative intra-class divergences for adversarial/augmented examples and robust/natural accuracies. Those concluding findings and observations are very important for us to devise our framework in the sequel. More specifically, we train a CNN in two modes: natural mode (NAT and cannot defend at all) and adversarial training mode (AT and can defend quite well). We observe how robust/natural accuracies together with relative intra-class divergences vary along the training progress to draw conclusions. The detailed settings and further demonstrations can be found in the supplementary material. Some observations are drawn from our experiment:
(O1) The robustness varies inversely with the relative intra-class divergence between benign images and their adversarial examples (the adversarial relative intra-class divergence ). As shown in Figure (b)b, during the training process, the robust accuracy of AT model tends to improve which concurs with the decrease of the adversarial relative intra-class divergence . Similarly, when the robust accuracy of NAT model starts increasing at the epoch 100, the adversarial relative intra-class divergence concurrently starts decreasing. In addition, the robust accuracy of AT model is significantly higher than that of NAT model because its is far lower than that of the NAT model. These observations support our claim of the relation between the robust accuracy and the relative intra-class divergence.
(O2) The natural accuracy varies inversely with the relative intra-class divergence between benign images and their augmented images (the augmented relative intra-class divergence ). As shown in Figure (c)c, along with the training process, the natural accuracies of both NAT and AT stably increase, while the augmented relative intra-class divergences for NAT and AT models also stably decrease. In addition, the augmented relative intra-class divergences for NAT model is remarkably lower than that of AT model, which concurs with the surpassness in the natural accuracy of the NAT model compared to the AT one. Those observations confirm our conclusion of the natural accuracy.
(O3) In Figure (a)a, we visualize the relative intra-class divergences for the cases of adversarial/augmented and the NAT/AT models in the three last layers (before the prediction layer ). It can be observed that in three layers for the AT model are smaller than those for the NAT model. This explains why the NAT model is easy to be attacked and again confirms our O1. Meanwhile, in three layers for the NAT model are smaller than those for the AT model. That explains why the NAT model has better natural accuracy compared to the AT model and again confirms our O2.
Conclusions from the observations.
Bui et al.  reached a conclusion that the absolute adversarial intra-class divergence is a key factor for robustness against adversarial examples. However, as indicated by our O1, it is only one side of a coin. The reason is that the absolute adversarial intra-class divergence only cares about how far adversarial examples of a class to their counterpart benign images and does not pay attention to the inter-divergence to other classes. It might happen that adversarial examples of other classes are very close to those of the given class, hence possibly compromising the robust accuracy. This further indicates that the absolute adversarial inter-class divergence needs to be taken into account. Minimizing the relative adversarial intra-class divergence better controls both the absolute adversarial intra-class divergence and absolute adversarial inter-class divergence for strengthening robustness.
The observation O1 regarding minimizing the relative adversarial intra-class divergence for improving robustness motivates us to leveraging with the principle and spirit of the Supervised Contrastive Learning (SCL) framework  in the context of AML for achieving more robust models. Although the key principle and spirit of contrastive learning which is to contrast the representations of positive and negative examples w.r.t given anchors naturally matches AML defense context as pointed by our observation O1, this leverage is highly non-trivial and requires plenty of efforts.
3 Proposed method
In this section, we provide the answer for the question “How to integrate CL with adversarial training in the context of AML?”. We first propose an adapted version of SCL which we call Adversarial Supervised Contrastive Learning (ASCL) for the AML problem. We then introduce three sample selection strategies to nominate the most relevant positives and negatives to the anchor which further improve robustness with much fewer samples.
3.1 Adversarial Supervised Contrastive Learning
We consider a prediction model where is the encoder which outputs the latent representation and is the classifier upon the latent . Also we have a batch of N pairs of benign images and their labels. With an adversarial transformation (e.g., PGD  or TRADES ), each pair has two corresponding sets, a positive set and a negative set . We then have the corresponding sets in the latent space and .
Supervised Contrastive Loss.
The supervised contrastive loss for an anchor as follow:
where represents the similarity metric between two latent representations and is a temperature parameter. It is worth noting that there are two changes in our SCL loss compared with the original one in . Firstly,
is a general form of similarity, which can be any similarity metric such as cosine similarityor Lp norm . Secondly, in term of terminology, in , the positive set was defined including those samples in the same class with the anchor (e.g. ) and the anchor’s transformation . However, in our paper, we want to emphasize the importance of the anchor’s transformation, therefore, we use two separate terminologies and . Similarly, the SCL loss for an anchor as follow:
The average SCL loss over a batch as follow:
As mentioned in 
, there is a major advantage of SCL compared with Self-Supervised CL (SSCL) in the context of regular machine learning. Unlike SSCL in which each anchor has only single positive sample, SCL takes advantages of the labels to have many positives in the same batch size N. This strategy helps to reduce the false negative cases in SSCL when two samples in the same class are pushed apart. As shown in, the SCL training is more stable than SSCL and also achieves a better performance.
Adaptations in the context of AML.
However, SCL alone is not sufficient to achieve adversarial robustness. In the context of adversarial machine learning, we need the following adaptations to improve the adversarial robustness:
(i) Figures ((b)b,(c)c) show that adversarial attacks are powerful enough to find adversarial examples which are more diverse than the augmentation techniques. Therefore, we use an adversary (e.g., PGD or TRADES) as the transformation instead of the traditional data augmentation (e.g., combination of random cropping and random jittering) as in other contrastive learning frameworks [8, 18, 15]. This helps to reduce the divergence in latent representations of a benign image and its adversarial example directly.
(ii) Because of the cost of generating adversarial examples, we use only one adversarial example for each input instead of multiple transformations as in other frameworks. Using more adversarial examples has been proved to increase the performance , but comes at much higher computational cost.
(iii) We apply SCL as a regularization on top of the Adversarial Training (AT) method [22, 36, 30, 33]. Therefore, instead of pre-training the encoder with contrastive learning loss as in previous work, we can optimize the AT and the SCL simultaneously. The AT objective function with the cross-entropy loss is as follows:
Regularization on the prediction space.
Clustering assumption  is a technique that encourages the classifier to preserve its predictions for data examples in a cluster. Theoretically, the clustering assumption enforces the decision boundary of a given classifier to lie in the gap among the data clusters and never crosses over any clusters. As shown in [8, 18], with the help of CL, latent representations of those samples in the same class form into clusters. Therefore, coupling our SCL framework with the clustering assumption can help to increase the margin from data sample to the decision boundary. The experimental results in Section 4.2 shows that the clustering assumption really helps to improve the robustness. To enforce the clustering assumption, we use Virtual Adversarial Training (VAT)  to maintain the classifier smoothness.
Putting all together.
We combine the relevant terms to the final objective function of our framework which we name as Adversarial Supervised Contrastive Learning (ASCL) as follow:
where and are hyper-parameters to control the SCL loss and VAT loss, respectively.
3.2 Global and Local Selection Strategies
3.2.1 Global Selection
The SCL as in Equations 1,2 can be understood as SCL with a Global Selection strategy, where each anchor takes all other samples in the current batch into account and splits them into a positive set and a negative set . For example, as illustrated in Figure (a)a, given an anchor, with the help of SCL, it will push away all negatives and pull all positives regardless of their correlation in the space. However, there are two issues of this Global Selection.
(I1) The high inter-class divergence issue of a diverse dataset. Specifically, there are true negative (but uncorrelated) samples which are very different in appearance (e.g., a dog and a shark) and latent representations. Therefore, pushing them away does not make any contribution to the learning other than making it more unstable. The number of uncorrelated negatives is increased when the dataset is more diverse. Moreover, as shown in ((b)b,(c)c), the representations of adversarial examples are much more diverse than that of benign images. Therefore, the inter-class divergence is much higher in the context of AML.
(I2) The high intra-class divergence issue when the dataset is very diverse in some classes. For example, a class “dog” in the ImageNet dataset may include many sub-classes (breeds) of dog. Specifically, there are true positive (but uncorrelated) samples which are in the same class with the anchor but different in appearance. The intra-divergence in these classes is already high, therefore enforcing them to be too close can make the training unstable. In the context of AML, two samples in the same class (e.g., “dog”) can be attacked to be very different classes (e.g., one to the class “cat”, one to the class “shark”), therefore the latent representations of their adversarial examples are even more uncorrelated. In consequence, this issue is more serious in the context of AML.
3.2.2 Local Selection
Based on the above analysis, we leverage the label supervision to propose a series of Local Selection (LS) strategies for the SCL framework, which consider local and important samples only and ignore other samples in the batch as illustrated in Figure (b)b. They are Hard-LS, Soft-LS and Leaked-LS as defined in Table 1.
More specifically, in Hard-LS and Soft-LS, we consider the same set of positives as in Global Selection. However, we filter out the true negative but uncorrelated samples by only considering those are predicted as similar to the anchor’s true label (Hard-LS) or to the anchor’s predicted label (Soft-LS). These two strategies are to deal with the issue (I1) by choosing negative samples that have most correlation with the current anchor. Because they are very close in prediction space, their representations is likely high correlated with the anchor’s representation.
In Leaked-LS, we add an additional constraint on the positive set to deal with the issue (I2). Specifically, we filter out the true positive but uncorrelated samples by only choosing those are currently predicted as similar to the anchor’s prediction. It is worth noting that, the additional constraint is applied on the positive set only. It means that, each anchor and its adversarial example are always pulled close together. However, instead of pulling all other positive samples in current batch, we only pull those samples are close with the anchor’s representation to further support and stabilize the contrastive learning.
From a practical perspective, as later shown in the experimental section, ASCL with Leaked-Local Selection (Leaked ASCL) improves the robustness over that with Global Selection, more notably, and with much fewer positive and negative samples.
In this section, we empirically answer the question “What are the important factors for the application of the ASCL framework in the context of AML?” through our experiments. We first introduce the experimental setting for adversarial attacks and defenses. We then provide ablation studies to investigate the importance of each component to the performance and a comparison among Global/Local Selection strategies. We show that the Leaked-ASCL not only outperforms the Global ASCL but also makes use of much fewer positives and negatives. We apply our ASCL and Leaked-ASCL as a regularization technique on either ADV or TRADES and demonstrate that our method significantly improves the robustness of AT methods.
4.1 Experimental Setting
4.1.1 General Setting
We use CIFAR10 and CIFAR100  as the benchmark datasets in our experiment. Both datasets have 50,000 training images and 10,000 test images. However, while the CIFAR10 dataset has 10 classes, CIFAR100 is more diverse with 100 classes. The inputs were normalized to . We apply random horizontal flips and random shifts with scale for data augmentation as used in . We use three architectures including standard CNN, ResNet20 and ResNet50  in our experiment. The architecture and training setting for each dataset are provided in our supplementary material.
4.1.2 Contrastive Learning Setting
We choose the penultimate layer ( as the intermediate layer to apply our regularization. The ablation study for the effect of choosing projection head in the context of AML can be found in the supplementary material. In the main paper, we report the experimental results without the projection head. The temperature as in .
4.1.3 Attack Setting
We use different state-of-the-art attacks to evaluate the defense methods including:
(i) PGD attack which is the gradient based attack. We use for the CIFAR10 dataset and for the CIFAR100 dataset. We use two versions of the PGD attack: the non-targeted PGD attack (PGD) and the multi-targeted PGD attack (mPGD).
(ii) Auto-Attack  which is an ensemble based attack. We use for the CIFAR10 dataset and for the CIFAR100 dataset, both with the standard version of Auto-Attack (AA) which is an ensemble of four different attacks.
The distortion metric we use in our experiments is for all measures. We use the full test set for the attacks (i) and 1000 test samples for the attacks (ii).
4.1.4 Generating Adversarial Examples for Defenders
We employ either PGD or TRADES as the stochastic adversary to generate adversarial examples. These adversarial examples have been used as transformations of benign images in our contrastive framework. We use the same setting for both PGD and TRADES. Specifically, the configuration for the CIFAR10 dataset is and that for the CIFAR100 dataset is .
4.2 Ablation study
We first provide an ablation study to investigate the contribution of each of ASCL’s components to the performance. We experiment on the CIFAR10 dataset with the ResNet20 architecture. The comparison in Table 2 shows that: (i) Using SCL alone cannot help to improve adversarial robustness (ii) Adding SCL with the adversarial training ADV increases the natural accuracy, but reduces the robustness; (iii) In contrast, adding VAT increases the robustness but reduces the natural accuracy; (iv) Adding both SCL and VAT significantly improve the robustness of the model.
The observation (ii) can be explained by the fact that SCL forces the latent space to be more compact, which helps the classifier more easily distinguish between clusters. On the other hand, the VAT regularization enforces the predictions of the benign image and its adversarial example close together in the prediction space, which helps to improve the robustness as in the observation (iii). It also concurs with the trade-off theory between natural accuracy and robustness as discussed in . When adding SCL to VAT, the compactness in prediction space is extended backward to the latent space, which further improves the robustness, as in observation (iv).
4.3 Global and Local Selection strategies
In this subsection, we compare the effect of different global/local selection strategies to the final performance. The experiment is conducted on the CIFAR10 dataset with the ResNet20 architecture. The comparison in Table 3 shows that while the Hard-ASCL and Soft-ASCL show a small improvement over ASCL, the Leaked-ASCL achieves the best robustness compare with other strategies.
We also measure the average number of positive and negatives samples per batch corresponding with different selection strategies as shown in Figure (a)a. With batch size 128, we have a total of 256 samples per batch including benign images and their adversarial examples. It can be seen that, the average positives and negatives by the Global Selection are stable at 26.4 and 228.6, respectively. In contrast, the number of positives and negatives by the Leaked-LS vary corresponding with the current performance of the model. We provide an example of selected positive and negative samples which have been chosen by the Leaked-LS as Figure 4. More specifically, there are four advantages of the Leaked-LS over the Global Selection:
(i) at the beginning of training, approximately 7.5 positive samples and 25 negative samples were selected (e.g., Figure (a)a). This is because of the low classification performance of the model. Moreover, the strength of the contrastive loss is directly proportional with the size of the positive set. Therefore, with a small positive set, the contrastive loss is weak in comparison with other components of ASCL. This helps the model focuses more on improving the classification performance first.
(ii) when the model is improved, the number of positive samples is increased, while the number of negative samples is decreased significantly (e.g., Figure (b)b). In addition with the bigger positive set, the contrastive loss become stronger in comparison with other components. This helps the model now focuses more on the contrastive learning and learning the compact latent representation.
(iii) unlike the Global Selection, the Leaked Local Selection considers natural images and adversarial images differently based on their hardness to the current anchor. As shown in Figure (b)b, there are more adversarial images than natural images in the negative set, which helps the encoder focus to contrast the anchor with the adversarial images.
(iv) at the last epoch, Leaked-LS chooses only 11.3 positives and 14.3 negatives which are equivalent to and of the positive set and negative set with the Global Selection strategy, respectively.
4.4 Robustness evaluation
Finally, we conduct extensive robustness evaluations to demonstrate the advantages of the proposed method. We apply the two versions, ASCL and Leaked-ASCL, as a regularization on top of two adversarial training methods, PGD adversarial training (ADV) and TRADES. We compare our methods with ADR which is the state-of-the-art regularization technique as proposed in . The experiments are conducted on the CIFAR10 and CIFAR100 datasets. The comparison on Tables (4,5,6) shows that our ASCL method significantly improves both adversarial training based models by around to robust accuracy. Moreover, our ASCL also outperforms the ADR method by around to with ResNet20 and with ResNet50. Finally, our Leaked-ASCL consistently improves over our ASCL method by around to with ResNet20 and with ResNet50, which again demonstrates the benefit of the Local Selection in the context of AML.
As mentioned in [8, 18], the batch size is an important factor which strongly affects the performance of the contrastive learning framework. A larger batch size comes with a larger positive and negative sets, which helps to generalize the contrastive correlation better, and therefore improves the performance. However, because of the limitation on computational resources, we only tried with a small batch size (128) which likely limits the contribution of our method.
In this paper, we have shown the connection between the robust/natural accuracies and the divergence in latent spaces. We demonstrated that Supervised Contrastive Learning can be applied to improve the adversarial robustness by reducing the intra-instance divergence while maintaining the inter-class divergence. Moreover, we have shown that, instead of using all negatives and positives as per the regular contrastive learning framework, by judiciously picking highly correlated samples, we can further improve the adversarial robustness.
-  Naveed Akhtar and Ajmal Mian. IEEE Access, 6:14410–14430, 2018.
-  Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision, pages 484–501. Springer, 2020.
-  Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, pages 274–283, 2018.
-  Anh Bui, Trung Le, He Zhao, Paul Montague, Olivier deVel, Tamas Abraham, and Dinh Phung. Improving adversarial robustness by enforcing local and global compactness. arXiv preprint arXiv:2007.05123, 2020.
-  Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey Kurakin. On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.
-  N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. IEEE, 2017.
-  O. Chapelle and A. Zien. Semi-supervised classification by low density separation. In AISTATS, volume 2005, pages 57–64, 2005.
-  Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020.
-  Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. arXiv preprint arXiv:1907.02044, 2019.
-  Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. arXiv preprint arXiv:2003.01690, 2020.
-  Francesco Croce, Jonas Rauber, and Matthias Hein. Scaling up the randomized gradient-free adversarial attack reveals overestimation of robustness using established attacks. International Journal of Computer Vision, pages 1–19, 2019.
-  Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations, 2018.
-  Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
-  Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33, 2020.
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick.
Momentum contrast for unsupervised visual representation learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-  Ziyu Jiang, Tianlong Chen, Ting Chen, and Zhangyang Wang. Robust pre-training by adversarial contrastive learning. arXiv preprint arXiv:2010.13337, 2020.
-  Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. arXiv preprint arXiv:2004.11362, 2020.
-  Minseon Kim, Jihoon Tack, and Sung Ju Hwang. Adversarial self-supervised contrastive learning. arXiv preprint arXiv:2006.07589, 2020.
-  Alex Krizhevsky et al. Learning multiple layers of features from tiny images. 2009.
-  Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), pages 656–672. IEEE, 2019.
-  Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
-  Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, and Baishakhi Ray. Metric learning for adversarial robustness. In Advances in Neural Information Processing Systems, pages 480–491, 2019.
-  Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267, 2017.
-  Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
T. Miyato, S. Maeda, M. Koyama, and S. Ishii.
Virtual adversarial training: A regularization method for supervised and semi-supervised learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8):1979–1993, Aug 2019.
-  Tianyu Pang, Kun Xu, Chao Du, Ning Chen, and Jun Zhu. Improving adversarial robustness via promoting ensemble diversity. In International Conference on Machine Learning, pages 4970–4979, 2019.
-  Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.
Florian Schroff, Dmitry Kalenichenko, and James Philbin.
Facenet: A unified embedding for face recognition and clustering.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015.
-  A. Shafahi, M. Najibi, M A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L S. Davis, G. Taylor, and T. Goldstein. Adversarial training for free! In Advances in Neural Information Processing Systems, pages 3353–3364, 2019.
-  Aravind Srinivas, Michael Laskin, and Pieter Abbeel. Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136, 2020.
-  Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), pages 1–5. IEEE, 2015.
-  Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, and Quoc V Le. Smooth adversarial training. arXiv preprint arXiv:2006.14536, 2020.
-  C. Xie, Y. Wu, L v d. Maaten, A L Yuille, and K. He. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 501–509, 2019.
-  Haichao Zhang and Jianyu Wang. Defense against adversarial attacks using feature scattering-based adversarial training. In Advances in Neural Information Processing Systems, pages 1829–1839, 2019.
-  Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573, 2019.
7 Training setting
We use standard CNN architecture as described in  for the experiment to investigate the anchor divergence in Section 2 and ResNet architecture  for all other experiments. For ResNet architecture, we use the same training setting as in . More specifically, we use Adam optimizer, with learning rate at epoch 0, 80, 120, and 160, respectively. We use Adam optimization with learning rate for training the standard CNN architecture. The training time is 200 epochs for both CIFAR10 and CIFAR100 datasets with batch size 128.
8 Anchor Divergence on the Latent Space
The training setting has been described in Section 7. Because the intra-class/inter-class divergences are averagely calculated on all pairs of latent representations which is over our computational capacity, therefore, we alternately calculate these divergences on a mini-batch (128) and take the average over all mini-batches.
Benefit of ASCL to the Anchor Divergence.
In addition to the result in the main paper which aims to show the motivation of using CL in the context of AML, in this section we provide a further result to show the benefit of our ASCL which can reduce the relative intra-class divergence, subsequently, improves the adversarial robustness. Figure (a)a shows that our ASCL has much lower divergence comparing with the standard adversarial training (AT) over the whole training process, therefore, achieves higher robust accuracy. Figure (a)a shows that, when measuring in different intermediate layers, our ASCL consistently get the lowest divergence.
9 Additional Experimental Results
9.1 Projection Head in the context of AML
In this section we provide an additional ablation study to further understand the effect of the projection head in the context of AML. We apply our methods (ASCL and Leaked ASCL) with three options of the projection head as shown in Figure 6: (i) a projection head with only single linear layer with layer weight , (ii) a projection head with two fully connected layers without bias with layer weight and and (iii) identity mapping . Table 7 shows the performances of three options on the CIFAR10 dataset with ResNet20 architecture. We observe that the linear projection head is better than the identity mapping on both natural accuracy (by around 1%) and robust accuracy (on average 0.7%) which enlarges the gap between our methods and the baseline methods as reported in Section 4 and again emphasizes the advantage of our methods. In contrast, the non-linear projection head reduces the robust accuracy on average 0.5%. The improvement on the natural accuracy concurs with the finding in  which can be explained by the fact that the projection head helps to reduce the dimensionality to apply the contrastive loss more efficiently. As shown in Section B.4 in 
that even using the same output size, the weight of the projection head has relatively few large eigenvalues, indicating that it is approximately low-rank. On the other hand, the effect of the projection head to the robust accuracy is due to its non-linearity. Figure(a)a demonstrates the training flow and attack flow on our framework with the projection head. The contrastive loss is applied in the projected layer which induces the compactness on the projected layer but not the intermediate layer . Therefore, when using a non-linear projection head (e.g., ), the compactness in the intermediate layer is weaker than the projected layer. For example, a relationship in the projected layer can not imply a relationship in the intermediate layer. It explains why using the non-linear projection head reduces the effectiveness of the SCL to the adversarial robustness.
9.2 Contribution of each component in ASCL
We provide an additional experiment to further understand the contribution of each component in our framework. Table 8 shows the result on the CIFAR10 dataset with ResNet20 architecture. We observe that using SCL alone can helps to improve the natural accuracy, but enforcing the contrastive loss too much reduces the effectiveness. On the other hand, increasing the VAT’s weight increases the robustness but significantly reduces the natural performance which concurs with the finding in . Therefore, to balance the trade-off between natural accuracy and robustness, we choose as the default setting in our framework.
10 Background and Related work
In this section, we present a fundamental background and related work to our approach. First, we introduce well-known contrastive learning frameworks, followed by a brief introduction of adversarial attack and defense methods. We then provide a comparison of our approach with defense methods on a latent space, especially, those integrated with contrastive learning frameworks.
10.1 Contrastive Learning
10.1.1 General formulation
Self-Supervised Learning (SSL) became an important tool that helps Deep Neural Networks exploit structure from gigantic unlabeled data and transfers it to downstream tasks. The key success factor of SSL is choosing a pretext task that heuristically introduces interaction among different parts of the data (e.g., CBOW and Skip-gram, predicting rotation ). Recently, Self-Supervised Contrastive Learning (SSCL) with contrastive learning as the pretext task surpasses other SSL frameworks and nearly achieves supervised-learning’s performance. The main principle of SSCL is to introduce a contrastive correlation among visual representations of positives (’similar’) and negatives (’dissimilar’) with respect to an anchor one. There are several SSCL frameworks have been proposed (e.g., MoCo , BYOL , CURL ), however, in this section, we mainly introduce the SSCL in  which had been integrated with adversarial examples to improve adversarial robustness in [19, 17] followed by the Supervised Contrastive Learning (SCL)  which has been used in our approach.
Consider a batch of N pairs of benign images and their labels. With two random transformations we have a set of transformed images . The general formulation of contrastive learning as follow:
where is the contrastive loss w.r.t. the anchor :
and is the contrastive loss w.r.t. the anchor :
The formulation shows the general principle of contrastive learning such that: (i) where and are positive and negative sets which are defined differently depending on self-supervised/supervised setting, (ii) without loss of generality, in Equation 8, the similarity between the anchor and a positive sample has been normalized with sum of all possible pairs between the anchor and the union set of to ensures that the log argument is not higher than 1, (iii) the contrastive loss in Equation 8 pulls anchor representation and the positives’ representations close together while pushes apart those of negatives . It is worth noting that, our derivation shows the general formulation of the contrastive learning which can be adapted to SSCL , SCL  or our Local ASCL by defining the positive and negative sets differently. Moreover, by using terminologies positive set and those sample from the same instance separately, we emphasize the importance of the anchor’s transformation which stand out other positives. Last but not least, our derivation normalizes the contrastive loss in Equation 7 to the same scale with the cross-entropy loss and the VAT loss as in Section 3, which helps to put all terms together appropriately.
Self-Supervised Contrastive Learning.
In SSCL, the positive set (excluding those samples from the same instance ) while the negative set which includes all other samples except those from the same instance . In this case, the formulation of SSCL as follow:
Supervised Contrastive Learning.
The SCL framework leverages the idea of contrastive learning with the presence of label supervision to improve the regular cross-entropy loss. The positive set and the negative set are and , respectively. As mentioned in , there is a major advantage of SCL compared with SSCL in the context of regular machine learning. Unlike SSCL in which each anchor has only single positive sample, SCL takes advantages of the labels to have many positives in the same batch size N. This strategy helps to reduce the false negative cases in SSCL when two samples in the same class are pushed apart. As shown in , the SCL training is more stable than SSCL and also achieves a better performance.
10.1.2 Important factors for Contrastive Learning
Chen et al.  empirically found that SSCL needs stronger data augmentation than supervised learning. While the SSCL’s performance experienced a huge gap of 5% with different data augmentation (Table 1 in ), the supervised performance was not changed much with the same set of augmentation. Therefore, in our paper, to reduce the space of hyper-parameters we use only one adversarial transformation (e.g., PGD or TRADES) while using the identity transformation , (), and let the investigation of using different data augmentations for future works.
As shown in Figure 9 in , the batch size is an important factor that strongly affects the performance of the contrastive learning framework. A larger batch size comes with larger positive and negative sets, which helps to generalize the contrastive correlation better and therefore improves the performance. He et al.  proposed a memory bank to store the previous batch information which can lessen the batch size issue. In our framework, because of the limitation on computational resources, we only tried with a small batch size (128) which likely limits the contribution of our methods.
Normally, the representation vector which is the output of the encoder network has very high dimensionality, e.g., the final pooling layer in ResNet-50 and ResNet-200 has 2048 dimensions. Therefore, applying contrastive learning directly on this intermediate layer is less effective. Alternatively, CL frameworks usually use a projection networkto project the normalized representation vector into a lower dimensional vector which is more suitable for computing the contrastive loss. To avoid over-parameterized, CL frameworks usually choose a small projection head with only one or two fully-connected layers.
10.2 Adversarial attack
Projected Gradient Decent (PGD).
is an iterative version of the FGSM attack  with random initialization . It first randomly initializes an adversarial example in a perturbation ball by adding uniform noise to a clean image, followed by multiple steps of one-step gradient ascent, at each step projecting onto the perturbation ball. The formula for the one-step update is as follows:
where is the perturbation ball with radius around and is the gradient scale for each step update.
Even the most popular attack, PGD can still fail in some extreme cases  because of two issues: (i) fixed step size
which leads to sub-optimal solutions and (ii) the sensitivity of a gradient to the scale of logits in the standard cross-entropy loss. Auto-Attack proposed two variants of PGD to deal with these potential issues by (i) automatically selecting the step size across iterations (ii) an alternative logit loss which is both shift and rescaling invariant. Moreover, to increase the diversity among the attacks used, Auto-Attack combines two new versions of PGD with the white-box attack FAB and the blackbox attack Square Attack 
to form a parameter-free, computationally affordable, and user-independent ensemble of complementary attacks to estimate adversarial robustness. Therefore, besides PGD, Auto-Attack is considered as the new standard evaluation for adversarial robustness.
10.3 Adversarial defense
10.3.1 Adversarial training
Adversarial training (AT) originate in , which proposed incorporating a model’s adversarial examples into training data to make the model’s loss surface to be smoother, thus, improve its robustness. Despite its simplicity, AT  was among the few that were resilient against attacks other than gave a false sense of robustness because of the obfuscated gradient . To continue its success, many AT’s variants have been proposed including (1) different types of adversarial examples (e.g., the worst-case examples  or most divergent examples ), (2) different searching strategies (e.g., non-iterative FGSM, Rand FGSM with a random initial point or PGD with multiple iterative gradient descent steps ), (3) additional regularizations, e.g., adding constraints in the latent space [35, 4]
, (4) difference in model architecture, e.g., activation function or ensemble models .
10.3.2 Defense with a latent space
Unlike an input space , a latent space has a lower dimensionality and a higher mutual information with the prediction space than the input one . Therefore, defense with the latent space has particular characteristics to deal with adversarial attacks notably [35, 4, 23, 34, 28]. For example, DefenseGAN  used a pretrained GAN which emulates the data distribution to generate a denoised version of an adversarial example. On the other hand, instead of removing noise in the input image, Xie et al.  attempted to remove noise in the feature space by using non-local means as a denoising block. However, these works were criticized by  as being easy to attack by approximating the backward gradient signal.
10.3.3 Defense with contrastive learning
The idea of defense with contrastive correlation in the latent space can be traced back to  which proposed an additional triplet regularization to adversarial training. However, the triplet loss can only handle one positive and negative at a time, moreover, requires computationally expensive hard negative mining . As discussed in , the triplet loss is a special case of the contrastive loss when the number of positives and negatives are each one and has lower performance in general than the contrastive loss. Recently, [17, 19] integrated SSCL  to learn unsupervised robust representations for improving robustness in unsupervised/semi-supervised setting. Specifically, both methods proposed a new kind of adversarial examples which is based on the SSCL loss instead of regular cross-entropy loss  or KL divergence . By adversarially pre-training with these adversarial examples, the encoder is robust against the instance-wise attack and obtains comparable robustness to supervised adversarial training as reported in . On the other hand, Jiang et al.  proposed three options of pre-training. However, their best method made use of two adversarial examples that requires a much higher computational cost to generate. Although these above works have the similar general idea of using contrastive learning to improve adversarial robustness with ours, we do not compare our methods with them due to the major difference in problem setting.
Most closely related to our work is  which also aims to realize the compactness in latent space to improve the robustness in supervised setting. They proposed a label weighting technique that sets the positive weight to the divergence of two examples in the same class and negative weight in any other cases. Therefore, when minimizing the divergence loss with label weighting, the divergences of those in the same class (positives) are encouraged to be close together, while those of different classes (negatives) to be distant.