On the Robustness of Domain Adaption to Adversarial Attacks

by   Liyuan Zhang, et al.
Chongqing University

State-of-the-art deep neural networks (DNNs) have been proved to have excellent performance on unsupervised domain adaption (UDA). However, recent work shows that DNNs perform poorly when being attacked by adversarial samples, where these attacks are implemented by simply adding small disturbances to the original images. Although plenty of work has focused on this, as far as we know, there is no systematic research on the robustness of unsupervised domain adaption model. Hence, we discuss the robustness of unsupervised domain adaption against adversarial attacking for the first time. We benchmark various settings of adversarial attack and defense in domain adaption, and propose a cross domain attack method based on pseudo label. Most importantly, we analyze the impact of different datasets, models, attack methods and defense methods. Directly, our work proves the limited robustness of unsupervised domain adaptation model, and we hope our work may facilitate the community to pay more attention to improve the robustness of the model against attacking.


page 2

page 4


Exploring Adversarially Robust Training for Unsupervised Domain Adaptation

Unsupervised Domain Adaptation (UDA) methods aim to transfer knowledge f...

Robustified Domain Adaptation

Unsupervised domain adaptation (UDA) is widely used to transfer a model ...

Attention, Please! Adversarial Defense via Attention Rectification and Preservation

This study provides a new understanding of the adversarial attack proble...

Boundary and Entropy-driven Adversarial Learning for Fundus Image Segmentation

Accurate segmentation of the optic disc (OD) and cup (OC)in fundus image...

Adversarial Attack and Defense on Graph Data: A Survey

Deep neural networks (DNNs) have been widely applied in various applicat...

Randomized Histogram Matching: A Simple Augmentation for Unsupervised Domain Adaptation in Overhead Imagery

Modern deep neural networks (DNNs) achieve highly accurate results for m...

DAD: Data-free Adversarial Defense at Test Time

Deep models are highly susceptible to adversarial attacks. Such attacks ...

1. Introduction

The powerful representation ability makes DNNs perform better and better in domain adaption. Great effort has been devoted to developing robust domain adaption models (Ganin and Lempitsky, 2015; Ganin et al., 2016; Long et al., 2018b; Tzeng et al., 2017; Long et al., 2017; Cai et al., 2019), in order to overcome the huge shift among domains. For example, the accuracy of the state-of-the-art result of domain adaption model on the Office-31 datasets (Saenko et al., 2010) is 86.6% (Long et al., 2018a), increasing rapidly from 34% when the dataset was first released in 2010.

Yet, (Szegedy et al., 2014) proposed that DNN can be very vulnerable to adversarial attacks (Biggio et al., 2013; Dalvi et al., 2004; Szegedy et al., 2014). Fig.1 shows an adversarial case, where a backpack image from Office-31 is presented. By adding noise to the samples, the network will mistakenly think that the backpack is a speaker, a mobile phone or a projector. This phenomenon has attracted a lot of attention. Numerous strategies have been proposed to get more robust models to adversarial examples (Goodfellow et al., 2014; Kurakin et al., 2016; Carlini and Wagner, 2017). However, current works almost focus on the standard image classification models (Inception in (Kurakin et al., 2016), and GoogleNet, VGG and ResNet in (Liu et al., 2016)

) using small datasets like mnist and cifar-10,

. Hence, these works can hardly play guiding role to more concrete and complex tasks, and the vulnerability of modern DNNs to adversarial attacks on more complex systems remains unclear, such as unsupervised domain adaption covering different domains. Besides, existing attack methods usually require the label information to generate the adversarial samples (Goodfellow et al., 2014; Carlini and Wagner, 2017; Su et al., 2017), which is unrealistic for domain adaption since the labels of the target domain is not available.

Therefore, it is meaningful to explore the robustness of the model against adversarial attacks in more complex domain adaption systems, and a new method to attack domain adaption model is required. In this paper, we present, what to our knowledge is, the first systematic research on unsupervised domain adaption model’s robustness to adversarial attacks. Our main contributions can be summarized as follows:

(1) We first define and benchmark various experimental settings for Adversarial Domain Attack and Defense (ADAD), including white-box (Goodfellow et al., 2014; Carlini and Wagner, 2017; Brown et al., 2017) and black-box (Zhou et al., 2020; Dolatabadi et al., 2020) attack, non-targeted and targeted attack, white-box and black-box defense, . And we first choose three different methods to explore their robustness on three datasets of different scales including small, medium and large;

(2) We first propose a Fast Gradient Sign Method based on Pseudo Label (PL-FGSM), which can be used as a basic attack method for domain adaptation. This method can attack the target domain samples by assigning pseudo labels to them. Extensive experiments show that this method is effective for domain adaptation model;

(3) We first extensively study the robustness of different existing domain adaption models to adversarial samples, and compare different attack settings against the same model. Besides, the defense capability of different defense paradigms is also discussed. Finally, we will discuss some interesting phenomena that may exist only in domain adaption.

Some relevant works to ours are (Arnab et al., [n.d.]) about semantic segmentation, (Bai et al., 2019) about person re-identification, and (Carlini and Wagner, 2017)

about face detection. We hope that our work can facilitate the development of robust feature learning, and the research of adversarial attack and defense of domain adaption.

Figure 1. Adversarial attack examples. After being added the noise, which does not affect human’s discrimination, the backpack is misclassified by model as speaker, mobil phone and projector.

2. Related Work

2.1. Domain Adaption

Transferring knowledge from a source domain with sufficient supervision to an unlabeled target domain is an important yet challenging problem. Unsupervised domain adaptation (UDA) addresses the challenge by learning a model which can use in different domains of different distributions. In shallow architectures, domain adaptation is mainly achieved by matching the marginal distributions (Sugiyama et al., 2007; Pan et al., 2010; Gong et al., 2013) or the conditional distributions (Courty et al., 2017; Zhang et al., 2013)

. In recent years, unsupervised domain adaptation has made remarkable advances in deep learning architectures and the strategies can be divided into two main categories.

The first category uses statistical discrepancy metrics to measure the distance across domains, and reduces domain shift by constraining statistics (Long et al., 2015, 2017; Li et al., 2020a; Zhang et al., 2019a; Pan et al., 2019; Cui et al., 2020; Li et al., 2020b). For example, DDC (Tzeng et al., 2014) uses linear-kernel maximum mean discrepancies (MMD) (Sejdinovic et al., 2013) to adapt a single layer in order to maximize domain invariance. The later work of DAN (Long et al., 2015) addressed these problems by using multi-kernel MMD (Gretton et al., 2012a, b) in multiple task-specific layers. Further, JAN (Long et al., 2017)

explores joint MMD to enforce joint distribution alignment between domains. Based on the optimal transport (OT) distance

(Zhang et al., 2019b; Li et al., 2020a) learn the optimal transport plan by enhanced transport distance (ETD).

The second category learns domain-invariant features in the confrontation between domain classifier and feature extractor

(Saito et al., 2018; Wang et al., 2019; Ganin and Lempitsky, 2015; Long et al., 2018a; Li et al., 2019). For example, DANN (Ganin and Lempitsky, 2015) introduces domain discriminator to domain adaptation. Further, CDAN (Long et al., 2018a) proposes a conditional domain discriminator. Such discriminator conditions the class information with feature representation, which can match the discrepancy of different distributions better. Besides, MCD (Saito et al., 2018) employs a new adversarial strategy where the adversarial manner occurs between the feature extractor and classifiers rather than the feature extractor and the domain discriminator.

Although these models have achieved better and better performance, their robustness to adversarial samples has not been fully explored.

2.2. Adversarial Attack and Defense

Since the discovery of adversarial examples for DNNs (Szegedy et al., 2014), plenty of attacking methods have been proposed in CV community. Adversarial samples aim at fooling the network while making the noise imperceptible to human beings. (Goodfellow et al., 2014) proposes to generate adversarial examples by using a single step based on the sign of the gradient for each pixel; (Kurakin et al., 2016) implements the above method in an iterative way; (Moosavi-Dezfooli et al., 2016) attacks the deep classifier by finding the nearest classification boundary. (Papernot et al., 2016a) utilize the Jacobian matrix to implicitly conduct a fixed length of noise through the direction of each axis. (Su et al., 2017) proposes to modify the single-pixel adversarial attack. In addition, some methods for visualizing noise are also proposed. (Brown et al., 2017) shows a way to generate special glasses, which can be used to successfully achieve directional or non directional attacks. (Brown et al., 2017) proposes to overlay a visible adversarial patch on the image and successfully fool the deep network. Adversarial attack is also discussed in some more complex tasks such as person re-identification (Wang et al., 2020; Bai et al., 2020), and semantic segmentation (Arnab et al., [n.d.]) and so on. However, as discussed Section 1, the above methods all require the label information to generate the adversarial samples which is unrealistic for domain adaptation.

For defense, (Goodfellow et al., 2014) proposes to add confrontation samples in the defense process, which means that the generated adversarial samples are added to the training set as new training samples, (Papernot et al., 2016b) mentions that in the distillation network, the gradient of student network is smoother. This feature can be used to resist the confrontation samples generated based on gradient. (Liao et al., 2018) proposes a denoiser network; (Xie et al., 2017) discuss that the random change of pixel level may lead to the failure of these specific disturbance points in the adversarial samples. (Zheng et al., 2016) proposes to force the network to learn similar feature representation from the original sample and the adversarial sample of the same image.

However, the above defense methods only discuss the defense performance of standard classification tasks. Most experiments was only carried out on the small MNIST dataset, and some defense was not effective on CIFAR-10 (Arnab et al., [n.d.]), underlining the importance of testing on multiple datasets. (Tramèr et al., 2017) also found that adversarially trained models are still susceptible to black-box attacks generated from other networks. Besides, as far as we know, there is no defense method that can effectively resist all attacks. Hence, these defense methods can’t provide a exact reference for the the state-of-the-art networks we consider in this work.

3. Methodology

Our methodology consists of three components. In 3.1 we formulate the generalized domain adaptation model. And we propose the loss with pseudo labels in 3.2. In 3.3 and 3.4, we first benchmark various settings of domain adversarial attack and defense, and then propose our methods.

3.1. Generalized Domain Adaptation

In unsupervised domain adaptation (UDA) (Pan, 2016), we are provided with a source domain with labeled examples, and a target domain with unlabeled examples. The source and target domains are sampled from distributions and respectively, and . The challenge of UDA is to train a classifier model with a low target risk . The model consists of two components and , where is the feature extractor and is the class predictor. The target risk can be bounded by the source risk plus the distribution discrepancy .

To constraint lower risk of source domain, UDA method calculate the loss of the classifier on the source domain as:


where is classifier loss. To constraint the distribution discrepancy, the method based on information statistics metrics (DDC (Tzeng et al., 2014), DAN (Long et al., 2015), JAN (Long et al., 2017), etc.) calculates the distance between source and target distributions:


And the method based on adversarial (DANN (Ganin and Lempitsky, 2015),CDAN (Long et al., 2018a),MCD (Saito et al., 2018), etc.) training uses domain discriminator loss as transfer loss:


where is the domain label of .

The general UDA loss can be formulated as:


We choose the classic DAN (Long et al., 2015), DANN (Ganin and Lempitsky, 2015) and CDAN (Long et al., 2018a) models as the attacked models in UDA.

3.2. Loss with Pseudo Labels

Existing adversarial attacks methods, such as FGSM (Goodfellow et al., 2014), FGSM ll (Kurakin et al., 2016), I-FGSM and I-FGSM ll (Kurakin et al., 2016; Carlini and Wagner, 2017)

algorithms are implemented under the standard classification task, where we need the ground truths of the samples to calculate the loss function, such as cross entropy loss. However, in domain adaptation, we are unable to get the label information of the target domain. So we propose the loss with pseudo labels (PL), i.e.,

for domain adversarial attack.

The cross-entropy loss with target pseudo labels is:


So the final objective can be defined as:


where is the domain adaptation loss in Eq.(4).

3.3. Adversarial Domain Attack

For a given classifier , the adversarial sample is obtained by adding some disturbance to the original sample . Such kind of disturbance may be imperceptible to human eyes. (Moosavi-Dezfooli et al., 2016) formulates the adversarial attack task as a conditional optimization model:


where is an image and is the predictor. We call the robustness of at point . Greater means that greater disturbance is required to destroy the model.

We benchmark several different attack settings for domain adaption, including non-targeted attack, targeted attack, white-box attack and black-box attack in following part.

3.3.1. Non-targeted Attack

It tends to destroy the similarity between images of the same label, or push a sample away from the classification decision boundary until it becomes another class, i.e., the model is fooled to make wrong predictions:


The attacker does not care about which class the adversarial samples are assigned, as long as they are not the true class.

In the untargeted attack, to acquire a misclassified adversarial image, the optimization goal of attack is to maximize loss , which can be defined as:


We achieve this optimization objective by making iterative steps in the negative direction of gradient descent direction . So the adversarial samples in untargeted attack can be generated by assigning:


where is a constraint function that ensures the adversarial samples are in ball of the clean sample. is real label for from source domain, and pseudo label for from target domain. The process is shown in Fig.2.

Figure 2. Process of PL-FGSM. The inputs to the model are samples of the source and target domain. is the gradient of on the sample x. The model is domain adaptation model which is trained by clean samples. We use source and target samples to calculate the loss and obtain disturbance by . Then we add the disturbance to the clean images to get the adversarial samples and use adversarial samples to the attack the model.

3.3.2. Targeted Attack

Targeted attack tends to add disturbance to the original image, in order to make the to be predicted to be a selected target label :


Unlike the non-targeted attack, targeted attack will find adversarial perturbations with determined target labels during the learning procedure. In the targeted attack, we choose the least-likely class according to the prediction of the trained network on image X as the desired target class:


To acquire an adversarial image which is classified as , the optimization goal of targeted attack is defined as:


We achieve this objective by making iterative steps in the direction of . So the adversarial samples in targeted attack can be generated by:


3.3.3. White-box and Black-box Attack

White-box attack

White-box attack requires the attackers to have prior knowledge of the target networks (Kurakin et al., 2016; Dong et al., 2018), including the architecture of networks, loss function and parameters . It means that the adversarial samples are generated from a network who has the exactly the same property, compared with the target network . Sometimes those samples are created by the targeted network directly. In the classification task, the network structure includes both backbones and classifiers .

Black-box attack

Black-box attack means that the attacker cannot obtain the information of the target network. We name the common methods in this scenario ’Random Initialization Method’ (RIM) and ’Avatar Network Method’ (ANM). For the RIM, (Guo et al., 2019)

proposes to iteratively select a random direction from a set of specified orthogonal representations, use the confidence degree to check whether it points to or away from the decision boundary, and directly add or subtract vectors to the image to disturb the image. In this way, each update will move the image away from the original image and towards the decision boundary. For the ANM, which is also called transfer attack, we usually train a model

, which can be treated as a substituting model of the target model , to generate adversarial samples . Then use to attack the target model. The success rate of black-box attack depends heavily on the transferability of adversarial samples. Therefore, the key point of black-box attack based on avatar model is to train an alternative model , whose attributes are as similar to the target model as possible. Usually, we train the by using the same datasets as the target model (ground-truth based), or collecting the outputs of the target model(pseudo-label based);

3.4. Adversarial Domain Defense

Currently, there is no defense method that works well against all attack methods. This motivates us to first study the properties of state-of-the-art domain adaption networks, and how they affect robustness to various adversarial attacks. A successful defense strategy should meet:

(1) For the clean samples , which is trained from randomly initialized model, should have the similar accuracy to the target model ;

(2) For the adversarial samples , should performs much better than the target model , and even get an accuracy similar to the baseline;

For black-box attack and white-box attack, we benchmark the defense strategies of domain adaption classification task:

Defense for white-box attack. Since the white-box model is completely visible and controllable, our defense strategy is designed as: 1) Learn a target model with clean samples and original loss function . This target model is our attack target. Use to generate adversarial sample ; 2) Mix the clean samples and adversarial , get . Then, train a whole new model with until it converges; 3) Put into the model , update the ; 4) Repeat 2) and 3) several times until convergence.

Our experimental results show that after several iterations, the defense network has a better performance of classification task on . In this sense, we think the model has better robustness to the updated adversarial attack.

Defense for black-box attack. It is meaningless to update adversarial samples of black-box attacks based on transfer attack, since the is generated from , and parameters of is fixed during defending. So we mix and to train the target network . Such kind of defense method can be regarded as a new data augment method.

4. Experiments and Results

4.1. Datasets and Implementation Details

4.1.1. Datasets.

Office-31 is a mainstream benchmark dataset in visual transfer learning, including 4652 images of 31 categories, which come from three real domains: In this experiment, it is used as a small dataset.

Office-Home is a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories.In this experiment, it is used as a medium scale dataset.

VisDA-2017 is a simulation-to-real dataset for domain adaptation with over 280,000 images across 12 categories in the training, validation and testing domains. In this experiment, it is used as a big scale dataset.

4.1.2. Adversarial Domain Attack and Defense.

Our main attack method is PL-FGSM, we tried the attack step = [0.01, 0.05, 0.1, 0.5] and iteration ITER = 40. Targeted and non-targeted attack can be achieved by Eq. (10 )and Eq. (14). White-box can be achieved by PL-FGSM, and black-box attack can be achieved by PL-FGSM in ANM way described in sec 3.2.1. Besides, Our defense strategy has been described in sec 3.3.2.

In section 5.2.1, we take non-targeted attack as an example to discuss the influence of different setting of adversarial attack and defense on different models and datasets. In sec 5.2.2, we give the result of non-targeted adversarial to form a comparison with sec 5.2.1.

4.2. Non-targeted Attack and Defense

4.2.1. White-Box Attack.

The attack results of 3 models with four 4 step sizes on 3 datasets are shown in Fig.3. The noise does not disturb the image obviously when = 0.01. The samples change little at the pixel level, and the performance degradation of the models is very small, which is almost within 10%. When = 0.1 and = 0.5, the performance of the model decreases greatly. However, the noises of the images is very large at this time, and the images are already fuzzy. So it is difficult to be used in the actual scene. To this end, We choose the setting of = 0.05 to focus on the analysis, where the attack ability is considerable, and the disturbance on the pixel level is acceptable.

The attack accuracy decrease rate results of 3 models on 3 datasets are shown in Fig.3. We compared CDAN to DANN because CDAN is an enhanced DANN model with higher migration performance. On the same dataset, the decrease rate of CDAN is lower than that of DANN, according to the experimental results. It may be deduced that the model’s robustness increases as the migration performance improves. Then we compared DANN and DAN, two basic transfer learning framework models. Because DANN’s decline rate is larger than DAN’s, we may conclude that an adversarial transfer learning model like DANN is less robust than a methodology like DAN that directly lowers the distribution difference between the two domains.

Figure 3. Non-Targeted White-Box Attack. It shows 3 models accuracy after attacked by 4 steps(, and means the original samples.) on 3 datasets(Office-31, Office-Home, VisDA).

4.2.2. White-Box Defense.

We select CDAN to do white-box defense on 3 datasets. The average results before and after defense are shown in Fig.4. After white-box defense,the accuracy of the adversarial samples on The first 2 datasets is greatly improved, but the accuracy of clean samples has been reduced to the same level as that of the adversarial samples.We speculate that the original distribution of the data set will be changed with the addition of confrontation samples, and it will be more difficult to fit the model to the original level. However, the same defense method does not perform well on the VisDA dataset. We speculate that this phenomenon is caused by the sample imbalance between the training set and the test set, which may need some special regularization method.

Figure 4. Non-Targeted White-Box Defense. It shows the accuracy of CDAN after white-box defense.
AD(%) AW(%) DA(%) DW(%) WA(%) WD(%) average(%)
DAN clean 83.73 79.62 65.70 96.85 65.56 99.79 81.88
white-box 40.36 48.80 34.07 57.61 19.38 46.78 41.17
DANN2DAN 41.76 39.49 44.51 66.16 40.14 53.21 47.55
CDAN2DAN 42.36 40.88 44.44 64.40 39.75 51.80 47.27
DANN clean 85.14 87.42 66.70 98.23 66.95 100.00 84.07
white-box 22.69 32.32 40.64 42.76 40.43 30.12 34.83
DAN2DANN 36.04 45.65 41.67 52.76 23.35 33.33 38.80
CDAN2DANN 23.29 31.19 40.25 43.39 40.57 32.325 35.17
CDAN clean 93.17 92.32 71.31 98.61 70.64 100.00 87.67
white-box 28.91 37.48 44.62 45.28 32.72 45.38 39.06
DAN2DANN 38.95 53.68 46.10 60.65 19.46 47.29 44.36
DANN2CDAN 28.71 38.23 45.01 45.28 33.26 46.18 39.44
Table 1. Non-Targeted Black-Box Attack(Office-31). ’Clean’ means the accuracy of the clean samples. ’white-box’ means the accuracy of the white-box attack shown in fig1. ’CDAN2DAN’ means use the samples generated by CDAN to attack DAN. ’avarage’ means the average accuracy, and this applies to following tables, too.
samples(%) AD(%) AW(%) DA(%) DW(%) WA(%) WD(%) (attack) average(%)
DAN DANN clean 76.14 71.56 42.89 87.63 41.26 88.11 (81.88) 67.93
DANN ADV 68.48 63.48 44.01 85.66 46.39 89.26 (47.55) 66.21
CDAN clean 82.48 65.48 47.13 91.59 43.17 93.75 (81.88) 70.60
CDAN ADV 61.45 64.41 50.42 90.68 45.47 89.48 (47.27) 66.98
DANN DAN clean 79.91 78.84 48.56 95.72 51.04 99.19 (84.07) 75.53
DAN ADV 59.63 64.15 50.12 87.67 37.09 88.35 (38.80) 64.50
CDAN clean 72.08 68.55 41.74 92.45 42.17 95.78 (84.07) 68.79
CDAN ADV 65.46 66.66 49.69 92.57 46.92 96.78 (35.17) 69.68
CDAN DAN clean 89.35 63.29 87.16 98.36 55.91 100.00 (87.67) 82.34
DAN ADV 77.30 61.09 81.63 93.58 49.16 97.18 (44.36) 76.66
DANN clean 85.14 79.25 48.84 96.47 52.28 98.99 (87.67) 76.83
DANN ADV 79.71 75.97 54.70 97.61 53.28 99.79 (39.44) 76.84
Table 2. Non-targeted Black-Box Defense(Office-31). (attack) means black-box attack average.

4.2.3. Black-Box Attack.

We discuss the results of the three models’ transfer attacks on the Office-31, and =0.05. The average accuracy of each dataset are showed in Table 1. Transfer based attacks result in interesting phenomena: 1)Due to the similar model structure of DANN and CDAN, their mutual attack performance is close. The result of DANN attacking CDAN is 39.44%, while that of CDAN attacking DANN is 35.17%. In contrast, due to the large difference between the models, the attack effect of DAN on the other two is poor. This suggests that we may get better results when we attack simple models with complex models, while the attack power of simple models to complex models may be limited; 2)Due to the similarity of CDAN and DANN models, the performance of mutual black-box attack of CDAN and DANN is very close to the result of respective white-box attack. The result of CDAN white-box attack is 39.06%, and that of DANN attacking CDAN is 39.44%. It proves the feasibility of black-box attack based on transfer attack. When the avatar network is close to the original network, this black-box attack method can even achieve the performance comparable to the white-box attack.

4.2.4. Black-Box Defense.

We select = 0.05. The defense method refers to the black-box defense described in section 3.2. The results are shown in Table 2.

After defense, the accuracy of the model for the adversarial sample is greatly improved, but it is difficult to achieve the accuracy of the original model for clean samples. In general, the average accuracy of the original model for clean samples is more than 80%, while the accuracy of the trained model for dirty samples is around 60% - 80%. Besides, in most cases, the accuracy of defense model to dirty samples is slightly lower than that of clean samples after defense. For example, in the transfer attack of DAN2CDAN,the accuracy of the trained model is 82.34% for clean samples and 76.66% for attack samples. This phenomenon will be discussed in sec 5. Another interesting thing is that, after the defense training, the accuracy of the defense model for clean samples is slightly reduced, and the decline rate is about 10%.

4.3. Targeted Attack and Defense

By setting this subsection, we compared the targeted attack and defense with non-targeted attack and defense. We select part of models and datasets to repeat the experiment of sec 6.1 in targeted manner.

CDAN(%) DANN(%) DAN(%)
AD 22.28 21.08 34.73
AW 13.83 14.71 30.18
DA 32.62 33.83 31.16
DW 47.16 45.91 54.59
WA 27.90 30.35 18.17
WD 29.31 22.08 42.36
average 28.85 27.99 35.20
Table 3. Targeted White-Box Attack(Office-31)
AD 93.17 22.28 88.35 64.65
AW 92.32 13.83 85.91 75.72
DA 71.31 32.62 53.49 38.58
DW 98.61 47.16 97.10 94.84
WA 70.64 27.90 48.88 43.94
WD 100.00 29.31 99.79 86.14
average 87.67 28.85 78.92 67.31
Table 4. Targeted White-Box Defense(Office-31)
AD(%) AW(%) DA(%) DW(%) WA(%) WD(%) average(%)
DAN clean 83.73 79.62 65.70 96.85 65.56 99.79 81.88
white-box 34.73 30.18 31.16 54.59 18.17 42.36 35.20
DANN2DAN 50.20 43.14 42.52 78.92 39.97 67.67 53.74
CDAN2DAN 51.00 38.74 40.61 76.85 39.36 68.07 52.44
DANN clean 85.14 87.42 66.70 98.23 66.95 100.00 84.07
white-box 21.08 14.71 33.83 45.91 30.35 22.08 27.99
DAN2DANN 36.22 37.86 40.67 54.53 28.35 38.66 39.38
CDAN2DANN 22.48 15.72 31.27 47.04 33.93 23.29 28.96
CDAN clean 93.17 92.32 71.31 98.61 70.64 100.00 87.67
white-box 22.28 13.83 32.62 47.16 27.90 29.31 28.85
DAN2DANN 41.55 45.86 40.14 58.40 21.64 39.33 41.15
DANN2CDAN 25.10 31.57 37.02 54.84 26.23 32.93 34.61
Table 5. Targeted Black-Box Attack(Office-31)
samples AD(%) AW(%) DA(%) DW(%) WA(%) WD(%) (attack) average(%)
DAN DANN clean 76.70 75.09 56.86 96.60 58.75 98.99 (81.88) 77.17
DANN ADV 66.06 62.38 48.81 93.60 51.75 95.78 (53.74) 69.73
CDAN clean 70.48 68.67 52.25 94.96 51.40 97.99 (81.88) 72.62
CDAN ADV 60.04 58.23 46.36 91.44 46.39 93.97 (52.44) 66.07
DANN DAN clean 77.91 77.73 45.15 97.10 50.33 97.99 (84.07) 74.37
DAN ADV 67.47 71.95 49.91 92.32 47.78 93.17 (39.38) 70.43
CDAN clean 72.28 75.22 38.33 94.46 46.25 98.19 (84.07) 70.79
CDAN ADV 62.44 66.28 45.61 88.05 43.02 79.51 (28.96) 64.15
CDAN DAN clean 89.75 88.93 53.46 98.36 50.40 99.79 (87.67) 80.12
DAN ADV 77.51 86.28 55.69 97.23 52.96 98.39 (41.15) 78.01
DANN clean 87.14 87.04 54.34 97.61 53.67 100.00 (87.67) 79.97
DANN ADV 78.31 84.15 56.44 95.97 54.31 95.38 (34.61) 77.42
Table 6. Targeted Black-Box Defense(Office-31)

We select Office-31 datasets to three models in white-box manner, =0.05. The attack results are shown in Table 3. In general, the performance of white-box attack is slightly better than that of non-targeted attack, where CDAN decreased by 4.46%, DANN decreased by 3.59%, DAN decreased by 2.38%.

For white-box defense, we choose CDAN model to experiment on Office-31 dataset. The experimental results are shown in Table 4, which show similar properties with the defense of non-targeted white-box defense to us: the accuracy of the dirty samples increases greatly, the accuracy of clean samples decreases slightly, and the accuracy of clean samples is always higher than that of dirty samples. Moreover, the accuracy of the model attacked by targeted attack is higher than that of the model attacked by non-targeted method. The former is 78.32%, the latter is 69.93%.

For the black box-attack, we also attack each other with the adversarial samples generated by the 3 models, = 0.05. The results are shown in Table 5. Similarly, the effect of targeted black-box attack is slightly higher than that of targeted white-box attack. For example, for CDAN2DANN, the performance of targeted black-box attack is 1.07% less than the non-targeted black-box attack. Generally speaking, the result of targeted black-box attack is 1% - 2% lower than that of non-targeted black-box attack.

For the non-targeted black-box defense, we chose the same settings as targeted black-box defense, and the results are shown in Table 6. Similarly, the accuracy of dirty samples decreases slightly, and the accuracy of dirty samples is improved after defense. But both are difficult to achieve the original accuracy. Similar to non-targeted, the average accuracy of three models after defense is reduced by about 10%.

5. Discussion

During the defense of white-box, we find that the accuracy of individual domains is lower than that before defense, i.e. the accuracy of DA’s adversarial sample changes from 44.44% to 43.84%. Sometimes the test accuracy of clean samples is even lower than that of adversarial samples. We try to control the loss of defense to reduce the overfitting of the model:


where and are the cross entropy losses of adversarial samples and clean samples, and is the loss of domain adaption, such as . is the wight of adversarial samples’ loss to trade off the constraining force of and . We test the new loss in white-box defense with (Table 7), the accuracy of clean and adversarial samples are both improved, but the test accuracy of clean samples is lower than that of adversarial samples. This problem is worth further discussion.

weight clean
weight adv
DA 45.23 43.84 50.48 51.44
Table 7. White-box Defense of weight loss(Office-31)

6. Conclusion

In this work, we discuss the robustness of the domain adaptation model against adversarial effect. According to the characteristics of existing domain adaptation methods, we propose a fast gradient descent method based on pseudo label(PL-FGSM), which can be used as a basic method of adversarial attack for domain adaption. Experiments show that the performance of the existing models is greatly degraded under PL-FGSM’s attacking. This shows the limitation of cross domain model’s robustness.

In order to stimulate the community’s research on the above problems, we benchmark the various adversarial settings in domain adaptation, including white-box and black-box attack, targeted and non-targeted attack,ect. Through extensive experiments of three models on three datasets in small, medium and large scale, we get some interesting phenomena and meaningful conclusions. We hope that these results can provide a meaningful reference for future work.In addition, we discussed some of the unique properties of cross domain defense. Our code is open source and available to promote the development of cross domain attack and effective defense.


  • (1)
  • Arnab et al. ([n.d.]) A. Arnab, O. Miksik, and Philip H. S Torr. [n.d.]. On the Robustness of Semantic Segmentation Models to Adversarial Attacks. IEEE Transactions on Pattern Analysis and Machine Intelligence PP, 99 ([n. d.]), 1–1.
  • Bai et al. (2020) S. Bai, Y Li, Y Zhou, Q. Li, and Phs Torr. 2020. Adversarial Metric Attack and Defense for Person Re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence PP, 99 (2020), 1–1.
  • Bai et al. (2019) Song Bai, Yingwei Li, Yuyin Zhou, Qizhu Li, and Philip HS Torr. 2019. Metric attack and defense for person re-identification. arXiv e-prints (2019), arXiv–1901.
  • Biggio et al. (2013) Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013.

    Evasion attacks against machine learning at test time. In

    Joint European conference on machine learning and knowledge discovery in databases. Springer, 387–402.
  • Brown et al. (2017) Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. 2017. Adversarial patch. arXiv preprint arXiv:1712.09665 (2017).
  • Cai et al. (2019) Guanyu Cai, Yuqin Wang, Lianghua He, and MengChu Zhou. 2019.

    Unsupervised domain adaptation with adversarial residual transform networks.

    IEEE transactions on neural networks and learning systems 31, 8 (2019), 3073–3086.
  • Carlini and Wagner (2017) Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp). IEEE, 39–57.
  • Courty et al. (2017) Nicolas Courty, Rémi Flamary, Amaury Habrard, and Alain Rakotomamonjy. 2017. Joint distribution optimal transportation for domain adaptation. arXiv preprint arXiv:1705.08848 (2017).
  • Cui et al. (2020) Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, and Qi Tian. 2020. Towards discriminability and diversity: Batch nuclear-norm maximization under label insufficient situations. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    . 3941–3950.
  • Dalvi et al. (2004) Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, and Deepak Verma. 2004. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 99.
  • Dolatabadi et al. (2020) Hadi M Dolatabadi, Sarah Erfani, and Christopher Leckie. 2020. AdvFlow: Inconspicuous black-box adversarial attacks using normalizing flows. arXiv preprint arXiv:2007.07435 (2020).
  • Dong et al. (2018) Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. 2018. Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9185–9193.
  • Ganin and Lempitsky (2015) Yaroslav Ganin and Victor Lempitsky. 2015.

    Unsupervised domain adaptation by backpropagation. In

    IEEE Transactions on Pattern Analysis and Machine Intelligence. PMLR, 1180–1189.
  • Ganin et al. (2016) Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The journal of machine learning research 17, 1 (2016), 2096–2030.
  • Gong et al. (2013) Boqing Gong, Kristen Grauman, and Fei Sha. 2013. Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation. In International Conference on Machine Learning. PMLR, 222–230.
  • Goodfellow et al. (2014) I. J. Goodfellow, J. Shlens, and C. Szegedy. 2014. Explaining and Harnessing Adversarial Examples. Computer Science (2014).
  • Gretton et al. (2012a) Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012a. A kernel two-sample test. The Journal of Machine Learning Research 13, 1 (2012), 723–773.
  • Gretton et al. (2012b) Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, and Bharath K Sriperumbudur. 2012b. Optimal kernel choice for large-scale two-sample tests. In Advances in neural information processing systems

    . Citeseer, 1205–1213.

  • Guo et al. (2019) Chuan Guo, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Weinberger. 2019. Simple black-box adversarial attacks. In International Conference on Machine Learning. 2484–2493.
  • Kurakin et al. (2016) A. Kurakin, I. Goodfellow, and S. Bengio. 2016. Adversarial examples in the physical world. (2016).
  • Li et al. (2020b) Mengxue Li, Yi-Ming Zhai, You-Wei Luo, Peng-Fei Ge, and Chuan-Xian Ren. 2020b. Enhanced transport distance for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13936–13944.
  • Li et al. (2020a) Shuang Li, Chi Harold Liu, Qiuxia Lin, Qi Wen, Limin Su, Gao Huang, and Zhengming Ding. 2020a. Deep residual correction network for partial domain adaptation. IEEE transactions on pattern analysis and machine intelligence (2020).
  • Li et al. (2019) Shuang Li, Chi Harold Liu, Binhui Xie, Limin Su, Zhengming Ding, and Gao Huang. 2019. Joint adversarial domain adaptation. In Proceedings of the 27th ACM International Conference on Multimedia. 729–737.
  • Liao et al. (2018) Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. 2018. Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1778–1787.
  • Liu et al. (2016) Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2016. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016).
  • Long et al. (2015) Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. 2015. Learning transferable features with deep adaptation networks. In International conference on machine learning. PMLR, 97–105.
  • Long et al. (2018a) Mingsheng Long, Zhangjie Cao, Jianmin Wang, and Michael I Jordan. 2018a. Conditional adversarial domain adaptation. (2018).
  • Long et al. (2018b) M. Long, C. Yue, Z. Cao, J. Wang, and M. I. Jordan. 2018b. Transferable Representation Learning with Deep Adaptation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (2018), 3071–3085.
  • Long et al. (2017) Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. 2017. Deep transfer learning with joint adaptation networks. In International conference on machine learning. PMLR, 2208–2217.
  • Moosavi-Dezfooli et al. (2016) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2574–2582.
  • Pan et al. (2010) Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. 2010. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22, 2 (2010), 199–210.
  • Pan (2016) Weike Pan. 2016. A survey of transfer learning for collaborative recommendation with auxiliary data. Neurocomputing 177 (2016), 447–453.
  • Pan et al. (2019) Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, and Tao Mei. 2019. Transferrable prototypical networks for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2239–2247.
  • Papernot et al. (2016a) Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016a. The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P). IEEE, 372–387.
  • Papernot et al. (2016b) Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016b. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP). IEEE, 582–597.
  • Saenko et al. (2010) Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. 2010. Adapting visual category models to new domains. In European conference on computer vision. Springer, 213–226.
  • Saito et al. (2018) Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3723–3732.
  • Sejdinovic et al. (2013) Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics (2013), 2263–2291.
  • Su et al. (2017) J. Su, D. V. Vargas, and S. Kouichi. 2017. One pixel attack for fooling deep neural networks.

    IEEE Transactions on Evolutionary Computation

  • Sugiyama et al. (2007) Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul Von Buenau, and Motoaki Kawanabe. 2007.

    Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation.. In

    NIPS, Vol. 7. Citeseer, 1433–1440.
  • Szegedy et al. (2014) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. (2014).
  • Tramèr et al. (2017) Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2017. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204 (2017).
  • Tzeng et al. (2017) E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko. 2017. Simultaneous Deep Transfer Across Domains and Tasks. In 2015 IEEE International Conference on Computer Vision (ICCV).
  • Tzeng et al. (2014) Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014).
  • Wang et al. (2020) Hongjun Wang, Guangrun Wang, Ya Li, Dongyu Zhang, and Liang Lin. 2020. Transferable, controllable, and inconspicuous adversarial attacks on person re-identification with deep mis-ranking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 342–351.
  • Wang et al. (2019) Ximei Wang, Liang Li, Weirui Ye, Mingsheng Long, and Jianmin Wang. 2019. Transferable attention for domain adaptation. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    , Vol. 33. 5345–5352.
  • Xie et al. (2017) Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. 2017. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991 (2017).
  • Zhang et al. (2013) Kun Zhang, Bernhard Schölkopf, Krikamol Muandet, and Zhikun Wang. 2013. Domain adaptation under target and conditional shift. In International Conference on Machine Learning. PMLR, 819–827.
  • Zhang et al. (2019a) Yuchen Zhang, Tianle Liu, Mingsheng Long, and Michael Jordan. 2019a. Bridging theory and algorithm for domain adaptation. In International Conference on Machine Learning. PMLR, 7404–7413.
  • Zhang et al. (2019b) Zhen Zhang, Mianzhi Wang, and Arye Nehorai. 2019b. Optimal transport in reproducing kernel Hilbert spaces: Theory and applications. IEEE transactions on pattern analysis and machine intelligence 42, 7 (2019), 1741–1754.
  • Zheng et al. (2016) Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. 2016. Improving the robustness of deep neural networks via stability training. In Proceedings of the ieee conference on computer vision and pattern recognition. 4480–4488.
  • Zhou et al. (2020) Mingyi Zhou, Jing Wu, Yipeng Liu, Shuaicheng Liu, and Ce Zhu. 2020. DaST: Data-Free Substitute Training for Adversarial Attacks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 231–240.