Understanding and Achieving Efficient Robustness with Adversarial Contrastive Learning

01/25/2021 ∙ by Anh Bui, et al. ∙ CSIRO Monash University Australian Government Department of Defence 0

Contrastive learning (CL) has recently emerged as an effective approach to learning representation in a range of downstream tasks. Central to this approach is the selection of positive (similar) and negative (dissimilar) sets to provide the model the opportunity to `contrast' between data and class representation in the latent space. In this paper, we investigate CL for improving model robustness using adversarial samples. We first designed and performed a comprehensive study to understand how adversarial vulnerability behaves in the latent space. Based on these empirical evidences, we propose an effective and efficient supervised contrastive learning to achieve model robustness against adversarial attacks. Moreover, we propose a new sample selection strategy that optimizes the positive/negative sets by removing redundancy and improving correlation with the anchor. Experiments conducted on benchmark datasets show that our Adversarial Supervised Contrastive Learning (ASCL) approach outperforms the state-of-the-art defenses by 2.6% in terms of the robust accuracy, whilst our ASCL with the proposed selection strategy can further gain 1.4% improvement with only 42.8% positives and 6.3% negatives compared with ASCL without a selection strategy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently, there has been a considerable research effort on adversarial defense methods including [1, 21, 5, 24]

which aim to develop a robust Deep Neural Network against adversarial attacks. Among them, the adversarial training methods (e.g, FGSM, PGD adversarial training

[13, 22] and TRADES [36] that utilize adversarial examples as training data, have been one of the most effective approaches, which truly boost the model robustness without the facing the problem of obfuscated gradients [3]. In adversarial training, recent works [34, 4]

show that reducing the divergence of the representations of images and their adversarial examples in latent space (e.g., the feature space output from an intermediate layer of a classifier) can significantly improve the robustness. For example, in

[4], latent representations of images in the same class are pulled closer together than those in different classes, which led to a more compact latent space and consequently, better robustness.

In this paper, we show that the principle of robustifying classifiers by enhancing compactness in the latent space has a strong connection with contrastive learning (CL), an recent but increasingly popular and effective self-supervised representation learning approach [8, 15, 18, 14]. Specifically, CL learns representations of unlabeled data by choosing an anchor and pulling the anchor and its positive samples in latent space while pushing it away from many negative samples. Given the anchor, a clear important factor to the success of CL is the approach to choose the positive and negative sets. This is even more important if CL is to be applied under an supervised setting to robustify models against adversarial attacks. This inspires us to investigate the role of adversarial samples to be used in CL for improving model robustness. Our observations and experiments demonstrate that directly adopting CL into AML can hardly improve adversarial robustness, indicating that a deeper understanding of the relationships between the CL mechanism, latent space compactness, and adversarial robustness is required. Pursuing this comprehension, we give a detailed study on the above aspects and subsequently, propose a new framework for enhancing robustness using the principles of CL. Our paper provides answers for three research questions:

(a) Global Selection
(b) (Leaked) Local Selection
Figure 1: Illustration of SCL with Global/Local Selection strategies in the latent space. While Global Selection considers all other images in the batch as either positives or negatives, Local Selection nominates the most relevant samples to the anchor when operating contrastive learning. The decision is based on the correlation between the true labels and the predicted labels as in Table 1.

(Q1) Why can CL help to improve the adversarial robustness? By comprehensively investigating the behavior of divergence in latent space with different kinds of augmentations, our exercise shows that: robustness of a model can be interpreted by the ratio between two divergences in the latent space: the intra-class divergence measured on benign images and their adversarial examples of the same class and the inter-class divergence measured on those samples of different classes, and the lower the ratio is, the more robustness can be achieved. These observations motivate the idea that a robust model can be achieved by simultaneously contrasting the intra-class divergence between images and their adversarial examples with the inter-class divergence.

(Q2) How to integrate CL with adversarial training in the context of AML?

CL originally works with the case where data labels are unavailable, which does not fit the AML context, where we are more interested in robustifying classifiers for supervised learning. The recent research of Supervised Contrastive Learning (SCL)

[18] extends CL by leveraging label information, where latent representations from the same class are pulled closer together than those from different classes. While it might seem that SCL could be applied for AML problem, we show in this paper that it is highly nontrivial to do so. To this end, we propose Adversarial Supervised Contrastive Learning (ASCL) which effectively and efficiently utilizes adversarial samples to improve model robustness. First, for an anchor image, we use its adversarial images as the transformed/augmented samples, which is different from the standard data augmentation techniques used in conventional CL methods [8, 18]. Second, due to the high cost of generating adversarial examples, we use only a single transformation instead of multiple samples as in CL. Third, we integrate SCL with adversarial training [22] in addition to the clustering assumption [7], to force the compactness in latent space and subsequently improve the adversarial robustness.

(Q3) What are the important factors for the application of the ASCL framework in the context of AML? One of the key steps of CL/SCL is the selecting of positive and negative samples for an anchor image. Although different approaches have been proposed, most of them focus on natural images which have little effect for AML. Specifically, in a data batch, CL and SCL consider the samples that are not from the same instance or not in the same class of the anchor image as its negative samples, respectively, without taking into account the correlation between a sample and the anchor image. This can lead to too many true negative but useless samples which are highly uncorrelated with the anchor in the latent space as illustrated in Figure 1. Pushing the uncorrelated samples away from the anchor can be ineffective and make the training unstable. This issue aggravates with more diverse data and in the AML context, making the original CL/SCL approaches inapplicable in AML. Based on the comprehensive study on the intrinsic properties of adversarial images and their relationships to the benign counterparts, we develop a novel series of strategies for selecting positive and negative samples in our ASCL framework, which precisely picks the most relevant samples of the anchor that help to improve adversarial robustness most and to significantly improve training stability. By providing the answers to the above research questions, we summarize our contributions in this paper as follows:

  • We provide a comprehensive and insightful understanding of adversarial robustness regarding the divergences in latent space, which sheds light on adapting the contrastive learning principle to enhance robustness.

  • We propose a novel Adversarial Supervised Contrastive Learning (ASCL) framework, where the well-established contrastive learning mechanism is leveraged to make the latent space of a classifier more compact, leading to a more robust model against adversarial attacks. Recent works [19, 17] integrated Self-Supervised Contrastive Learning (SSCL)[8] to learn unsupervised robust representations for improving robustness in unsupervised/semi-supervised setting. Further discussion can be found in our supplementary material. To our knowledge, ours is the first work on integrating SCL with adversarial training to improve adversarial robustness in fully-supervised setting.

  • By analyzing the intrinsic characteristics of AML, we develop effective strategies for selecting positive and negative samples more precisely, which are critical to make the contrastive learning principle work in AML.

  • As shown in extensive experiments, our proposed framework is able to significantly improve a classifier’s robustness, outperforming several state-of-the-art adversarial training defense methods (e.g. ADV [22] and TRADES [36]) against strong attacks (e.g. Multi-targeted PGD [22] and Auto-Attack [10]) on the benchmark datasets.

2 Anchor Divergence on the Latent Space


(a)

Divergence on different layers at the final epoch.


(b) Robust accuracies for AT and NAT models.

(c) Natural accuracies for AT and NAT models.
Figure 2: Relative intra-class divergence comparison on the CIFAR10 dataset with standard CNN architecture.

Examining the question Why can CL help to improve the adversarial robustness?”, we design experiments to show the connection of the natural accuracy and robust accuracy to the latent divergence of an anchors and its contrastive samples.

Let be a batch of benign images with labels . Consider two types of transformations: : an adversarial transformation from an adversary (e.g., PGD attack) and : a standard augmentation (e.g., combination of random cropping and random jittering). Applying a transformation to benign image , we obtain where if and if .

Given a transformation , we consider two kinds of samples w.r.t an anchor : the positive set including benign examples and their counterparts in the same class with the anchor and the negative set including benign examples and their counterparts in different classes with the anchor. Note that we omit the superscript in two aforementioned sets for notion simplification.

From now then, the positive and negative sets are understood in the specific context w.r.t a fixed transformation which could be either or . We are interested in the representations of begin and transformed images at a specific intermediate layer of the neural net classifier . Let us further denote those representations by for benign images and for transformed images according to transformation .

We desire to speculate some types of divergences between benign images and transformed images via transformation at some intermediate layers of .

(i) Absolute intra-class divergence: (i.e., evaluated based on the positive sets); and absolute inter-class divergence: (i.e., evaluated based on the negative sets). Here we note that is cosine distance between two representations, and represents the cardinality of a set.

(ii) Relative intra-class divergence: ; hence relative divergence generally represents how large the magnitude of intra-class divergence is relatively with the inter-class divergence.

We conduct empirical study on CIFAR-10 dataset to figure out the relationship between relative intra-class divergences for adversarial/augmented examples and robust/natural accuracies. Those concluding findings and observations are very important for us to devise our framework in the sequel. More specifically, we train a CNN in two modes: natural mode (NAT and cannot defend at all) and adversarial training mode (AT and can defend quite well). We observe how robust/natural accuracies together with relative intra-class divergences vary along the training progress to draw conclusions. The detailed settings and further demonstrations can be found in the supplementary material. Some observations are drawn from our experiment:

(O1) The robustness varies inversely with the relative intra-class divergence between benign images and their adversarial examples (the adversarial relative intra-class divergence ). As shown in Figure (b)b, during the training process, the robust accuracy of AT model tends to improve which concurs with the decrease of the adversarial relative intra-class divergence . Similarly, when the robust accuracy of NAT model starts increasing at the epoch 100, the adversarial relative intra-class divergence concurrently starts decreasing. In addition, the robust accuracy of AT model is significantly higher than that of NAT model because its is far lower than that of the NAT model. These observations support our claim of the relation between the robust accuracy and the relative intra-class divergence.

(O2) The natural accuracy varies inversely with the relative intra-class divergence between benign images and their augmented images (the augmented relative intra-class divergence ). As shown in Figure (c)c, along with the training process, the natural accuracies of both NAT and AT stably increase, while the augmented relative intra-class divergences for NAT and AT models also stably decrease. In addition, the augmented relative intra-class divergences for NAT model is remarkably lower than that of AT model, which concurs with the surpassness in the natural accuracy of the NAT model compared to the AT one. Those observations confirm our conclusion of the natural accuracy.

(O3) In Figure (a)a, we visualize the relative intra-class divergences for the cases of adversarial/augmented and the NAT/AT models in the three last layers (before the prediction layer ). It can be observed that in three layers for the AT model are smaller than those for the NAT model. This explains why the NAT model is easy to be attacked and again confirms our O1. Meanwhile, in three layers for the NAT model are smaller than those for the AT model. That explains why the NAT model has better natural accuracy compared to the AT model and again confirms our O2.

Conclusions from the observations.

Bui et al. [4] reached a conclusion that the absolute adversarial intra-class divergence is a key factor for robustness against adversarial examples. However, as indicated by our O1, it is only one side of a coin. The reason is that the absolute adversarial intra-class divergence only cares about how far adversarial examples of a class to their counterpart benign images and does not pay attention to the inter-divergence to other classes. It might happen that adversarial examples of other classes are very close to those of the given class, hence possibly compromising the robust accuracy. This further indicates that the absolute adversarial inter-class divergence needs to be taken into account. Minimizing the relative adversarial intra-class divergence better controls both the absolute adversarial intra-class divergence and absolute adversarial inter-class divergence for strengthening robustness.

The observation O1 regarding minimizing the relative adversarial intra-class divergence for improving robustness motivates us to leveraging with the principle and spirit of the Supervised Contrastive Learning (SCL) framework [18] in the context of AML for achieving more robust models. Although the key principle and spirit of contrastive learning which is to contrast the representations of positive and negative examples w.r.t given anchors naturally matches AML defense context as pointed by our observation O1, this leverage is highly non-trivial and requires plenty of efforts.

3 Proposed method

In this section, we provide the answer for the question “How to integrate CL with adversarial training in the context of AML?”. We first propose an adapted version of SCL which we call Adversarial Supervised Contrastive Learning (ASCL) for the AML problem. We then introduce three sample selection strategies to nominate the most relevant positives and negatives to the anchor which further improve robustness with much fewer samples.

3.1 Adversarial Supervised Contrastive Learning

Terminologies.

We consider a prediction model where is the encoder which outputs the latent representation and is the classifier upon the latent . Also we have a batch of N pairs of benign images and their labels. With an adversarial transformation (e.g., PGD [22] or TRADES [36]), each pair has two corresponding sets, a positive set and a negative set . We then have the corresponding sets in the latent space and .

Supervised Contrastive Loss.

The supervised contrastive loss for an anchor as follow:

(1)

where represents the similarity metric between two latent representations and is a temperature parameter. It is worth noting that there are two changes in our SCL loss compared with the original one in [18]. Firstly,

is a general form of similarity, which can be any similarity metric such as cosine similarity

or Lp norm . Secondly, in term of terminology, in [18], the positive set was defined including those samples in the same class with the anchor (e.g. ) and the anchor’s transformation . However, in our paper, we want to emphasize the importance of the anchor’s transformation, therefore, we use two separate terminologies and . Similarly, the SCL loss for an anchor as follow:

(2)

The average SCL loss over a batch as follow:

(3)

As mentioned in [18]

, there is a major advantage of SCL compared with Self-Supervised CL (SSCL) in the context of regular machine learning. Unlike SSCL in which each anchor has only single positive sample, SCL takes advantages of the labels to have many positives in the same batch size N. This strategy helps to reduce the false negative cases in SSCL when two samples in the same class are pushed apart. As shown in

[18], the SCL training is more stable than SSCL and also achieves a better performance.

Adaptations in the context of AML.

However, SCL alone is not sufficient to achieve adversarial robustness. In the context of adversarial machine learning, we need the following adaptations to improve the adversarial robustness:

(i) Figures ((b)b,(c)c) show that adversarial attacks are powerful enough to find adversarial examples which are more diverse than the augmentation techniques. Therefore, we use an adversary (e.g., PGD or TRADES) as the transformation instead of the traditional data augmentation (e.g., combination of random cropping and random jittering) as in other contrastive learning frameworks [8, 18, 15]. This helps to reduce the divergence in latent representations of a benign image and its adversarial example directly.

(ii) Because of the cost of generating adversarial examples, we use only one adversarial example for each input instead of multiple transformations as in other frameworks. Using more adversarial examples has been proved to increase the performance [17], but comes at much higher computational cost.

(iii) We apply SCL as a regularization on top of the Adversarial Training (AT) method [22, 36, 30, 33]. Therefore, instead of pre-training the encoder with contrastive learning loss as in previous work, we can optimize the AT and the SCL simultaneously. The AT objective function with the cross-entropy loss is as follows:

(4)
Regularization on the prediction space.

Clustering assumption [7] is a technique that encourages the classifier to preserve its predictions for data examples in a cluster. Theoretically, the clustering assumption enforces the decision boundary of a given classifier to lie in the gap among the data clusters and never crosses over any clusters. As shown in [8, 18], with the help of CL, latent representations of those samples in the same class form into clusters. Therefore, coupling our SCL framework with the clustering assumption can help to increase the margin from data sample to the decision boundary. The experimental results in Section 4.2 shows that the clustering assumption really helps to improve the robustness. To enforce the clustering assumption, we use Virtual Adversarial Training (VAT) [26] to maintain the classifier smoothness.

(5)
Putting all together.

We combine the relevant terms to the final objective function of our framework which we name as Adversarial Supervised Contrastive Learning (ASCL) as follow:

(6)

where and are hyper-parameters to control the SCL loss and VAT loss, respectively.

3.2 Global and Local Selection Strategies

3.2.1 Global Selection

The SCL as in Equations 1,2 can be understood as SCL with a Global Selection strategy, where each anchor takes all other samples in the current batch into account and splits them into a positive set and a negative set . For example, as illustrated in Figure (a)a, given an anchor, with the help of SCL, it will push away all negatives and pull all positives regardless of their correlation in the space. However, there are two issues of this Global Selection.

(I1) The high inter-class divergence issue of a diverse dataset. Specifically, there are true negative (but uncorrelated) samples which are very different in appearance (e.g., a dog and a shark) and latent representations. Therefore, pushing them away does not make any contribution to the learning other than making it more unstable. The number of uncorrelated negatives is increased when the dataset is more diverse. Moreover, as shown in ((b)b,(c)c), the representations of adversarial examples are much more diverse than that of benign images. Therefore, the inter-class divergence is much higher in the context of AML.

(I2) The high intra-class divergence issue when the dataset is very diverse in some classes. For example, a class “dog” in the ImageNet dataset may include many sub-classes (breeds) of dog. Specifically, there are true positive (but uncorrelated) samples which are in the same class with the anchor but different in appearance. The intra-divergence in these classes is already high, therefore enforcing them to be too close can make the training unstable. In the context of AML, two samples in the same class (e.g., “dog”) can be attacked to be very different classes (e.g., one to the class “cat”, one to the class “shark”), therefore the latent representations of their adversarial examples are even more uncorrelated. In consequence, this issue is more serious in the context of AML.

3.2.2 Local Selection

Based on the above analysis, we leverage the label supervision to propose a series of Local Selection (LS) strategies for the SCL framework, which consider local and important samples only and ignore other samples in the batch as illustrated in Figure (b)b. They are Hard-LS, Soft-LS and Leaked-LS as defined in Table 1.

Global
Hard-LS
Soft-LS
Leaked-LS
Table 1: Definitions of positives and negatives with Global Selection and Local Selection strategies given an anchor and a predicted label ,

More specifically, in Hard-LS and Soft-LS, we consider the same set of positives as in Global Selection. However, we filter out the true negative but uncorrelated samples by only considering those are predicted as similar to the anchor’s true label (Hard-LS) or to the anchor’s predicted label (Soft-LS). These two strategies are to deal with the issue (I1) by choosing negative samples that have most correlation with the current anchor. Because they are very close in prediction space, their representations is likely high correlated with the anchor’s representation.

In Leaked-LS, we add an additional constraint on the positive set to deal with the issue (I2). Specifically, we filter out the true positive but uncorrelated samples by only choosing those are currently predicted as similar to the anchor’s prediction. It is worth noting that, the additional constraint is applied on the positive set only. It means that, each anchor and its adversarial example are always pulled close together. However, instead of pulling all other positive samples in current batch, we only pull those samples are close with the anchor’s representation to further support and stabilize the contrastive learning.

From a practical perspective, as later shown in the experimental section, ASCL with Leaked-Local Selection (Leaked ASCL) improves the robustness over that with Global Selection, more notably, and with much fewer positive and negative samples.

4 Experiments

In this section, we empirically answer the question “What are the important factors for the application of the ASCL framework in the context of AML?” through our experiments. We first introduce the experimental setting for adversarial attacks and defenses. We then provide ablation studies to investigate the importance of each component to the performance and a comparison among Global/Local Selection strategies. We show that the Leaked-ASCL not only outperforms the Global ASCL but also makes use of much fewer positives and negatives. We apply our ASCL and Leaked-ASCL as a regularization technique on either ADV[22] or TRADES[36] and demonstrate that our method significantly improves the robustness of AT methods.

4.1 Experimental Setting

4.1.1 General Setting

We use CIFAR10 and CIFAR100 [20] as the benchmark datasets in our experiment. Both datasets have 50,000 training images and 10,000 test images. However, while the CIFAR10 dataset has 10 classes, CIFAR100 is more diverse with 100 classes. The inputs were normalized to . We apply random horizontal flips and random shifts with scale for data augmentation as used in [27]. We use three architectures including standard CNN, ResNet20 and ResNet50 [16] in our experiment. The architecture and training setting for each dataset are provided in our supplementary material.

4.1.2 Contrastive Learning Setting

We choose the penultimate layer ( as the intermediate layer to apply our regularization. The ablation study for the effect of choosing projection head in the context of AML can be found in the supplementary material. In the main paper, we report the experimental results without the projection head. The temperature as in [18].

4.1.3 Attack Setting

We use different state-of-the-art attacks to evaluate the defense methods including:

(i) PGD attack which is the gradient based attack. We use for the CIFAR10 dataset and for the CIFAR100 dataset. We use two versions of the PGD attack: the non-targeted PGD attack (PGD) and the multi-targeted PGD attack (mPGD).

(ii) Auto-Attack [10] which is an ensemble based attack. We use for the CIFAR10 dataset and for the CIFAR100 dataset, both with the standard version of Auto-Attack (AA) which is an ensemble of four different attacks.

The distortion metric we use in our experiments is for all measures. We use the full test set for the attacks (i) and 1000 test samples for the attacks (ii).

4.1.4 Generating Adversarial Examples for Defenders

We employ either PGD or TRADES as the stochastic adversary to generate adversarial examples. These adversarial examples have been used as transformations of benign images in our contrastive framework. We use the same setting for both PGD and TRADES. Specifically, the configuration for the CIFAR10 dataset is and that for the CIFAR100 dataset is .

4.2 Ablation study

We first provide an ablation study to investigate the contribution of each of ASCL’s components to the performance. We experiment on the CIFAR10 dataset with the ResNet20 architecture. The comparison in Table 2 shows that: (i) Using SCL alone cannot help to improve adversarial robustness (ii) Adding SCL with the adversarial training ADV increases the natural accuracy, but reduces the robustness; (iii) In contrast, adding VAT increases the robustness but reduces the natural accuracy; (iv) Adding both SCL and VAT significantly improve the robustness of the model.

Nat. PGD mPGD AA
SCL 88.7 0.0 0.0 0.0
ADV 78.8 48.1 36.4 36.1
ADV+SCL 80.1 46.5 35.2 34.7
ADV+VAT 77.4 50.6 39.4 38.2
ADV+SCL+VAT 76.4 52.7 40.4 40.9
Table 2: Ablation study on the CIFAR10 dataset with ResNet20.

The observation (ii) can be explained by the fact that SCL forces the latent space to be more compact, which helps the classifier more easily distinguish between clusters. On the other hand, the VAT regularization enforces the predictions of the benign image and its adversarial example close together in the prediction space, which helps to improve the robustness as in the observation (iii). It also concurs with the trade-off theory between natural accuracy and robustness as discussed in [36]. When adding SCL to VAT, the compactness in prediction space is extended backward to the latent space, which further improves the robustness, as in observation (iv).

4.3 Global and Local Selection strategies

In this subsection, we compare the effect of different global/local selection strategies to the final performance. The experiment is conducted on the CIFAR10 dataset with the ResNet20 architecture. The comparison in Table 3 shows that while the Hard-ASCL and Soft-ASCL show a small improvement over ASCL, the Leaked-ASCL achieves the best robustness compare with other strategies.

We also measure the average number of positive and negatives samples per batch corresponding with different selection strategies as shown in Figure (a)a. With batch size 128, we have a total of 256 samples per batch including benign images and their adversarial examples. It can be seen that, the average positives and negatives by the Global Selection are stable at 26.4 and 228.6, respectively. In contrast, the number of positives and negatives by the Leaked-LS vary corresponding with the current performance of the model. We provide an example of selected positive and negative samples which have been chosen by the Leaked-LS as Figure 4. More specifically, there are four advantages of the Leaked-LS over the Global Selection:

(i) at the beginning of training, approximately 7.5 positive samples and 25 negative samples were selected (e.g., Figure (a)a). This is because of the low classification performance of the model. Moreover, the strength of the contrastive loss is directly proportional with the size of the positive set. Therefore, with a small positive set, the contrastive loss is weak in comparison with other components of ASCL. This helps the model focuses more on improving the classification performance first.

(ii) when the model is improved, the number of positive samples is increased, while the number of negative samples is decreased significantly (e.g., Figure (b)b). In addition with the bigger positive set, the contrastive loss become stronger in comparison with other components. This helps the model now focuses more on the contrastive learning and learning the compact latent representation.

(iii) unlike the Global Selection, the Leaked Local Selection considers natural images and adversarial images differently based on their hardness to the current anchor. As shown in Figure (b)b, there are more adversarial images than natural images in the negative set, which helps the encoder focus to contrast the anchor with the adversarial images.

(iv) at the last epoch, Leaked-LS chooses only 11.3 positives and 14.3 negatives which are equivalent to and of the positive set and negative set with the Global Selection strategy, respectively.

Nat. PGD mPGD AA
ASCL 76.4 52.7 40.4 40.9
Hard-ASCL 75.5 53.1 41.0 41.3
Soft-ASCL 75.5 53.4 40.6 40.4
Leaked-ASCL 75.5 53.7 41.0 42.0
Table 3: Comparison among Global/Local Selection Strategies on the CIFAR10 dataset with ResNet20
(a) Global/Local
(b) Detail of Leaked Local
Figure 3: Number of positives and negatives with different Global/Local Selection strategy on CIFAR10 dataset with batch size 128
(a) Epoch 1
(b) Epoch 200
Figure 4: Positive and negative samples from the Leaked Local Selection strategy. In each image, the first column represents the anchor followed by its positive and negative samples. Row 1 and 2 represent the natural and adversarial positive samples respectively. Row 3 and 4 represent the natural and adversarial negative samples respectively.

4.4 Robustness evaluation

Finally, we conduct extensive robustness evaluations to demonstrate the advantages of the proposed method. We apply the two versions, ASCL and Leaked-ASCL, as a regularization on top of two adversarial training methods, PGD adversarial training (ADV)[22] and TRADES[36]. We compare our methods with ADR which is the state-of-the-art regularization technique as proposed in [4]. The experiments are conducted on the CIFAR10 and CIFAR100 datasets. The comparison on Tables (4,5,6) shows that our ASCL method significantly improves both adversarial training based models by around to robust accuracy. Moreover, our ASCL also outperforms the ADR method by around to with ResNet20 and with ResNet50. Finally, our Leaked-ASCL consistently improves over our ASCL method by around to with ResNet20 and with ResNet50, which again demonstrates the benefit of the Local Selection in the context of AML.

Nat. PGD mPGD AA
ADV 78.8 48.1 36.4 36.1
ADV-ADR 76.8 51.5 38.9 38.6
ADV-ASCL 76.4 52.7 40.4 40.9
ADV-(Leaked)ASCL 75.5 53.7 41.0 42.0
TRADES 76.1 51.9 38.2 36.3
TRADES-ADR 72.3 53.3 40.1 39.5
TRADES-ASCL 72.5 54.8 40.8 40.3
TRADES-(Leaked)ASCL 71.9 55.2 40.6 40.2
Table 4: Robustness Evaluation on the CIFAR10 dataset with ResNet20 architecture
Nat. PGD mPGD AA
ADV 60.7 35.7 25.3 25.7
ADV-ADR 59.1 40.0 29.1 28.6
ADV-ASCL 57.6 41.3 29.8 30.2
ADV-(Leaked)ASCL 59.0 42.5 31.1 31.9
TRADES 59.0 37.2 25.3 25.7
TRADES-ADR 58.3 41.1 30.0 30.3
TRADES-ASCL 56.5 42.2 30.7 29.0
TRADES-(Leaked)ASCL 56.7 43.4 31.2 32.0
Table 5: Robustness Evaluation on the CIFAR100 dataset with ResNet20 architecture
Nat. PGD mPGD AA
ADV 82.2 48.6 38.2 38.7
ADV-ADR 80.6 51.8 41.4 41.5
ADV-ASCL 79.4 54.4 42.8 42.8
ADV-(Leaked)ASCL 78.7 55.8 43.8 43.6
Table 6: Robustness Evaluation on the CIFAR10 dataset with ResNet50 architecture

5 Discussion

As mentioned in [8, 18], the batch size is an important factor which strongly affects the performance of the contrastive learning framework. A larger batch size comes with a larger positive and negative sets, which helps to generalize the contrastive correlation better, and therefore improves the performance. However, because of the limitation on computational resources, we only tried with a small batch size (128) which likely limits the contribution of our method.

6 Conclusion

In this paper, we have shown the connection between the robust/natural accuracies and the divergence in latent spaces. We demonstrated that Supervised Contrastive Learning can be applied to improve the adversarial robustness by reducing the intra-instance divergence while maintaining the inter-class divergence. Moreover, we have shown that, instead of using all negatives and positives as per the regular contrastive learning framework, by judiciously picking highly correlated samples, we can further improve the adversarial robustness.

References

7 Training setting

We use standard CNN architecture as described in [6] for the experiment to investigate the anchor divergence in Section 2 and ResNet architecture [16] for all other experiments. For ResNet architecture, we use the same training setting as in [27]. More specifically, we use Adam optimizer, with learning rate at epoch 0, 80, 120, and 160, respectively. We use Adam optimization with learning rate for training the standard CNN architecture. The training time is 200 epochs for both CIFAR10 and CIFAR100 datasets with batch size 128.

8 Anchor Divergence on the Latent Space

Experimental Setting.

The training setting has been described in Section 7. Because the intra-class/inter-class divergences are averagely calculated on all pairs of latent representations which is over our computational capacity, therefore, we alternately calculate these divergences on a mini-batch (128) and take the average over all mini-batches.

Benefit of ASCL to the Anchor Divergence.
(a) Divergence on different layers at the final epoch.
(b) Robust accuracies for AT, NAT and our ASCL models.
Figure 5: Relative intra-class divergence comparison on the CIFAR10 dataset with standard CNN architecture.

In addition to the result in the main paper which aims to show the motivation of using CL in the context of AML, in this section we provide a further result to show the benefit of our ASCL which can reduce the relative intra-class divergence, subsequently, improves the adversarial robustness. Figure (a)a shows that our ASCL has much lower divergence comparing with the standard adversarial training (AT) over the whole training process, therefore, achieves higher robust accuracy. Figure (a)a shows that, when measuring in different intermediate layers, our ASCL consistently get the lowest divergence.

9 Additional Experimental Results

9.1 Projection Head in the context of AML

In this section we provide an additional ablation study to further understand the effect of the projection head in the context of AML. We apply our methods (ASCL and Leaked ASCL) with three options of the projection head as shown in Figure 6: (i) a projection head with only single linear layer with layer weight , (ii) a projection head with two fully connected layers without bias with layer weight and and (iii) identity mapping . Table 7 shows the performances of three options on the CIFAR10 dataset with ResNet20 architecture. We observe that the linear projection head is better than the identity mapping on both natural accuracy (by around 1%) and robust accuracy (on average 0.7%) which enlarges the gap between our methods and the baseline methods as reported in Section 4 and again emphasizes the advantage of our methods. In contrast, the non-linear projection head reduces the robust accuracy on average 0.5%. The improvement on the natural accuracy concurs with the finding in [8] which can be explained by the fact that the projection head helps to reduce the dimensionality to apply the contrastive loss more efficiently. As shown in Section B.4 in [8]

that even using the same output size, the weight of the projection head has relatively few large eigenvalues, indicating that it is approximately low-rank. On the other hand, the effect of the projection head to the robust accuracy is due to its non-linearity. Figure

(a)a demonstrates the training flow and attack flow on our framework with the projection head. The contrastive loss is applied in the projected layer which induces the compactness on the projected layer but not the intermediate layer . Therefore, when using a non-linear projection head (e.g., ), the compactness in the intermediate layer is weaker than the projected layer. For example, a relationship in the projected layer can not imply a relationship in the intermediate layer. It explains why using the non-linear projection head reduces the effectiveness of the SCL to the adversarial robustness.

(a) with the projection head
(b) without the projection head
Figure 6: Training/Attack flows with/without the projection head
Nat. PGD mPGD AA
ASCL without 76.4 52.7 40.4 40.9
ASCL with 77.3 53.3 41.1 41.3
ASCL with 76.6 52.3 40.0 39.7
(Leaked)ASCL without 75.5 53.7 41.0 42.0
(Leaked)ASCL with 76.5 54.1 41.9 42.3
(Leaked)ASCL with 75.7 52.9 40.7 41.1
Table 7: Performance comparison with/without the projection head on the CIFAR10 dataset with ResNet20 architecture. and represent for the projection head with one layer and two layers respectively.

9.2 Contribution of each component in ASCL

We provide an additional experiment to further understand the contribution of each component in our framework. Table 8 shows the result on the CIFAR10 dataset with ResNet20 architecture. We observe that using SCL alone can helps to improve the natural accuracy, but enforcing the contrastive loss too much reduces the effectiveness. On the other hand, increasing the VAT’s weight increases the robustness but significantly reduces the natural performance which concurs with the finding in [36]. Therefore, to balance the trade-off between natural accuracy and robustness, we choose as the default setting in our framework.

Nat. PGD mPGD AA
78.8 48.1 36.4 36.1
80.1 46.5 35.2 34.7
79.5 46.7 35.2 34.7
79.6 45.8 34.3 34.4
79.2 45.6 34.2 34.3
77.4 50.6 39.4 38.2
75.4 53.0 40.6 40.0
73.3 54.4 41.5 42.3
71.2 55.0 42.2 43.1
Table 8: Ablation study on the CIFAR10 dataset with ResNet20.

10 Background and Related work

In this section, we present a fundamental background and related work to our approach. First, we introduce well-known contrastive learning frameworks, followed by a brief introduction of adversarial attack and defense methods. We then provide a comparison of our approach with defense methods on a latent space, especially, those integrated with contrastive learning frameworks.

10.1 Contrastive Learning

10.1.1 General formulation

Self-Supervised Learning (SSL) became an important tool that helps Deep Neural Networks exploit structure from gigantic unlabeled data and transfers it to downstream tasks. The key success factor of SSL is choosing a pretext task that heuristically introduces interaction among different parts of the data (e.g., CBOW and Skip-gram

[25], predicting rotation [12]). Recently, Self-Supervised Contrastive Learning (SSCL) with contrastive learning as the pretext task surpasses other SSL frameworks and nearly achieves supervised-learning’s performance. The main principle of SSCL is to introduce a contrastive correlation among visual representations of positives (’similar’) and negatives (’dissimilar’) with respect to an anchor one. There are several SSCL frameworks have been proposed (e.g., MoCo [15], BYOL [14], CURL [31]), however, in this section, we mainly introduce the SSCL in [8] which had been integrated with adversarial examples to improve adversarial robustness in [19, 17] followed by the Supervised Contrastive Learning (SCL) [18] which has been used in our approach.

Consider a batch of N pairs of benign images and their labels. With two random transformations we have a set of transformed images . The general formulation of contrastive learning as follow:

(7)

where is the contrastive loss w.r.t. the anchor :

(8)

and is the contrastive loss w.r.t. the anchor :

(9)

The formulation shows the general principle of contrastive learning such that: (i) where and are positive and negative sets which are defined differently depending on self-supervised/supervised setting, (ii) without loss of generality, in Equation 8, the similarity between the anchor and a positive sample has been normalized with sum of all possible pairs between the anchor and the union set of to ensures that the log argument is not higher than 1, (iii) the contrastive loss in Equation 8 pulls anchor representation and the positives’ representations close together while pushes apart those of negatives . It is worth noting that, our derivation shows the general formulation of the contrastive learning which can be adapted to SSCL [8], SCL [18] or our Local ASCL by defining the positive and negative sets differently. Moreover, by using terminologies positive set and those sample from the same instance separately, we emphasize the importance of the anchor’s transformation which stand out other positives. Last but not least, our derivation normalizes the contrastive loss in Equation 7 to the same scale with the cross-entropy loss and the VAT loss as in Section 3, which helps to put all terms together appropriately.

Self-Supervised Contrastive Learning.

In SSCL[8], the positive set (excluding those samples from the same instance ) while the negative set which includes all other samples except those from the same instance . In this case, the formulation of SSCL as follow:

(10)

and

(11)
Supervised Contrastive Learning.

The SCL framework leverages the idea of contrastive learning with the presence of label supervision to improve the regular cross-entropy loss. The positive set and the negative set are and , respectively. As mentioned in [18], there is a major advantage of SCL compared with SSCL in the context of regular machine learning. Unlike SSCL in which each anchor has only single positive sample, SCL takes advantages of the labels to have many positives in the same batch size N. This strategy helps to reduce the false negative cases in SSCL when two samples in the same class are pushed apart. As shown in [18], the SCL training is more stable than SSCL and also achieves a better performance.

10.1.2 Important factors for Contrastive Learning

Data augmentation.

Chen et al. [8] empirically found that SSCL needs stronger data augmentation than supervised learning. While the SSCL’s performance experienced a huge gap of 5% with different data augmentation (Table 1 in [8]), the supervised performance was not changed much with the same set of augmentation. Therefore, in our paper, to reduce the space of hyper-parameters we use only one adversarial transformation (e.g., PGD[22] or TRADES[36]) while using the identity transformation , (), and let the investigation of using different data augmentations for future works.

Batch size.

As shown in Figure 9 in [8], the batch size is an important factor that strongly affects the performance of the contrastive learning framework. A larger batch size comes with larger positive and negative sets, which helps to generalize the contrastive correlation better and therefore improves the performance. He et al. [15] proposed a memory bank to store the previous batch information which can lessen the batch size issue. In our framework, because of the limitation on computational resources, we only tried with a small batch size (128) which likely limits the contribution of our methods.

Projection head.

Normally, the representation vector which is the output of the encoder network has very high dimensionality, e.g., the final pooling layer in ResNet-50 and ResNet-200 has 2048 dimensions. Therefore, applying contrastive learning directly on this intermediate layer is less effective. Alternatively, CL frameworks usually use a projection network

to project the normalized representation vector into a lower dimensional vector which is more suitable for computing the contrastive loss. To avoid over-parameterized, CL frameworks usually choose a small projection head with only one or two fully-connected layers.

10.2 Adversarial attack

Projected Gradient Decent (PGD).

is an iterative version of the FGSM attack [13] with random initialization [22]. It first randomly initializes an adversarial example in a perturbation ball by adding uniform noise to a clean image, followed by multiple steps of one-step gradient ascent, at each step projecting onto the perturbation ball. The formula for the one-step update is as follows:

(12)

where is the perturbation ball with radius around and is the gradient scale for each step update.

Auto-Attack.

Even the most popular attack, PGD can still fail in some extreme cases [11] because of two issues: (i) fixed step size

which leads to sub-optimal solutions and (ii) the sensitivity of a gradient to the scale of logits in the standard cross-entropy loss. Auto-Attack

[10] proposed two variants of PGD to deal with these potential issues by (i) automatically selecting the step size across iterations (ii) an alternative logit loss which is both shift and rescaling invariant. Moreover, to increase the diversity among the attacks used, Auto-Attack combines two new versions of PGD with the white-box attack FAB [9]and the blackbox attack Square Attack [2]

to form a parameter-free, computationally affordable, and user-independent ensemble of complementary attacks to estimate adversarial robustness. Therefore, besides PGD, Auto-Attack is considered as the new standard evaluation for adversarial robustness.

10.3 Adversarial defense

10.3.1 Adversarial training

Adversarial training (AT) originate in [13], which proposed incorporating a model’s adversarial examples into training data to make the model’s loss surface to be smoother, thus, improve its robustness. Despite its simplicity, AT [22] was among the few that were resilient against attacks other than gave a false sense of robustness because of the obfuscated gradient [3]. To continue its success, many AT’s variants have been proposed including (1) different types of adversarial examples (e.g., the worst-case examples [13] or most divergent examples [36]), (2) different searching strategies (e.g., non-iterative FGSM, Rand FGSM with a random initial point or PGD with multiple iterative gradient descent steps [22]), (3) additional regularizations, e.g., adding constraints in the latent space [35, 4]

, (4) difference in model architecture, e.g., activation function

[33] or ensemble models [27].

10.3.2 Defense with a latent space

Unlike an input space , a latent space has a lower dimensionality and a higher mutual information with the prediction space than the input one [32]. Therefore, defense with the latent space has particular characteristics to deal with adversarial attacks notably [35, 4, 23, 34, 28]. For example, DefenseGAN [28] used a pretrained GAN which emulates the data distribution to generate a denoised version of an adversarial example. On the other hand, instead of removing noise in the input image, Xie et al. [34] attempted to remove noise in the feature space by using non-local means as a denoising block. However, these works were criticized by [3] as being easy to attack by approximating the backward gradient signal.

10.3.3 Defense with contrastive learning

The idea of defense with contrastive correlation in the latent space can be traced back to [23] which proposed an additional triplet regularization to adversarial training. However, the triplet loss can only handle one positive and negative at a time, moreover, requires computationally expensive hard negative mining [29]. As discussed in [18], the triplet loss is a special case of the contrastive loss when the number of positives and negatives are each one and has lower performance in general than the contrastive loss. Recently, [17, 19] integrated SSCL [8] to learn unsupervised robust representations for improving robustness in unsupervised/semi-supervised setting. Specifically, both methods proposed a new kind of adversarial examples which is based on the SSCL loss instead of regular cross-entropy loss [13] or KL divergence [36]. By adversarially pre-training with these adversarial examples, the encoder is robust against the instance-wise attack and obtains comparable robustness to supervised adversarial training as reported in [19]. On the other hand, Jiang et al. [17] proposed three options of pre-training. However, their best method made use of two adversarial examples that requires a much higher computational cost to generate. Although these above works have the similar general idea of using contrastive learning to improve adversarial robustness with ours, we do not compare our methods with them due to the major difference in problem setting.

Most closely related to our work is [4] which also aims to realize the compactness in latent space to improve the robustness in supervised setting. They proposed a label weighting technique that sets the positive weight to the divergence of two examples in the same class and negative weight in any other cases. Therefore, when minimizing the divergence loss with label weighting, the divergences of those in the same class (positives) are encouraged to be close together, while those of different classes (negatives) to be distant.