Introduction
Domain Adaptation (DA) aims to train a targetdomain classifier with samples from source and target domains Lu et al. (2015). When the labels of samples in the target domain are unavailable, DA is known as unsupervised DA (UDA) Zhong et al. (2020); Fang et al. (2020), which has been applied to address diverse realworld problems, such as computer version Zhang et al. (2020c); Dong et al. (2019, 2020b)
Lee and Jha (2019); Guo, Pasunuru, and Bansal (2020), and recommender system Zhang et al. (2017); Yu, Wang, and Yuan (2019); Lu et al. (2020)Significant theoretical advances have been achieved in UDA. Pioneering theoretical work was proposed by BenDavid et al. BenDavid et al. (2007). This work shows that the target risk is upper bounded by three terms: source risk, marginal distribution discrepancy, and combined risk. This earliest learning bound has been extended from many perspectives, such as considering more surrogate loss functions Zhang et al. (2019a) or distributional discrepancies Mohri and Medina (2012); Shen et al. (2018) (see Redko et al. (2020) as a survey). Recently, Zhang et al. Zhang et al. (2019a) proposed a new distributional discrepancy termed Margin Disparity Discrepancy and developed a tighter and more practical UDA learning bound.
The UDA learning bounds proposed by BenDavid et al. (2007, 2010) and the recent UDA learning bounds proposed by Shen et al. (2018); Xu et al. (2020); Zhang et al. (2020b) consist of three terms: source risk, marginal distribution discrepancy, and combined risk. Minimizing the source risk aims to obtain a sourcedomain classifier, and minimizing the distribution discrepancy aims to learn domaininvariant features so that the sourcedomain classifier can perform well on the target domain. The combined risk embodies the adaptability between the source and target domains BenDavid et al. (2010). In particularly, when the hypothesis space is fixed, the combined risk is a constant.
Based on the UDA learning bounds where the combined risk is assumed to a small constant, many existing UDA methods focus on learning domaininvariant features Fang et al. (2019); Dong et al. (2020c, a); Liu et al. (2019) by minimizing the estimators of the source risk and the distribution discrepancy. In the learned feature space, the source and target distributions are similar while the sourcedomain classifier is required to achieve a small error. Furthermore, the generalization error of the sourcedomain classifier is expected to be small in the target domain.
However, the combined risk may increase when learning the domaininvariant features, and the increase of the combined risk may degrade the performance of the sourcedomain classifier in the target domain. As shown in Figure 1, we calculate the value of the combined risk and accuracy on a realworld UDA task (see the green line). The performance worsens with the increased combined risk. Zhao et al. Zhao et al. (2019) also pointed out the increase of combined risk causes the failure of sourcedomain classifier on the target domain.
To investigate how the combined risk affect the performance on the domaininvariant features, we rethink and develop the UDA learning bounds by introducing feature transformations. In the new bound (see Eq. (5)), the combined risk is a function related to feature transformation but not a constant (compared to bounds in BenDavid et al. (2010)). We also reveal that the combined risk is deeply related to the conditional distribution discrepancy (see Theorem 3). Theorem 3 shows that, the conditional distribution discrepancy will increase when the combined risk increases. Hence, it is hard to achieve satisfied targetdomain accuracy if we only focus on learning domaininvariant features and omit to control the combined risk.
To estimate the combined risk, the key challenge takes root in the unavailability of the labeled samples in the target domain. A simple solution is to leverage the pseudo labels with high confidence in the target domain to estimate the combined risk. However, since samples with high confidence are insufficient, the value of the combined risk may still increase (see the green line in Figure 1
). Inspired by semisupervised learning methods, an advanced solution is to directly use the
mixup technique to augment pseudolabeled target samples, which can slightly help us estimate the combined risk better than the simple solution (see the orange line in Figure 1).However, the targetdomain pseudo labels provided by the sourcedomain classifier may be inaccurate due to the discrepancy between domains, which causes that mixup may not perform well with inaccurate labels. To mitigate the issue, we propose enhanced mixup (emixup) to substitute mixup to compute a proxy of the combined risk. The purple line in Figure 1 shows that the proxy based on emixup can significantly boost the performance. Details of the proxy is shown in section Motivation.
To the end, we design a novel UDA method referred to EMixNet. EMixNet learns the targetdomain classifier by simultaneously minimizing the source risk, the marginal distribution discrepancy, and the proxy of combined risk. Via minimizing the proxy of combined risk, we control the increase of combined risk effectively, thus, control the conditional distribution discrepancy between two domains.
We conduct experiments on three public datasets (Office31, OfficeHome, and ImageCLEF) and compare EMixNet with a series of existing stateoftheart methods. Furthermore, we introduce the proxy of the combined risk into four representative UDA methods (i.e., DAN Long et al. (2015), DANN Ganin et al. (2016), CDAN Long et al. (2018), SymNets Zhang et al. (2019b)). Experiments show that EMixNet can outperform all baselines, and the four representative methods can achieve better performance if the proxy of the combined risk is added into their loss functions.
Problem Setting and Concepts
In this section, we introduce the definition of UDA, then introduce some important concepts used in this paper.
Let be a feature space and be the label space, where the label
is a onehot vector, whose
th coordinate is and the other coordinates are .Definition 1 (Domains in UDA).
Then, we propose the UDA problem as follows.
Problem 1 (Uda).
Given independent and identically distributed (i.i.d.) labeled samples drawn from the source domain and i.i.d. unlabeled samples drawn from the target marginal distribution , the aim of UDA is to train a classifier with and such that classifies accurately target data .
Given a loss function and any scoring functions from function space , source risk, target risk and classifier discrepancy are
Lastly, we define the disparity discrepancy based on double losses, which will be used to design our method.
Definition 2 (Double Loss Disparity Discrepancy).
Given distributions over some feature space , two losses , a hypothesis space and any scoring function , then the double loss disparity discrepancy is
(1) 
where
When losses are the margin loss Zhang et al. (2019a), the double loss disparity discrepancy is known as the Margin Disparity Discrepancy Zhang et al. (2019a).
Compared with the classical discrepancy distance Mansour, Mohri, and Rostamizadeh (2009):
(2) 
double loss disparity discrepancy is tighter and more flexible.
Theorem 1 (DA Learning Bound).
Given a loss satisfying the triangle inequality and a hypothesis space , then for any , we have
where is the discrepancy distance defined in Eq. (2) and known as the combined risk.
Theoretical Analysis
Here we introduce our main theoretical results. All proofs can be found at https://github.com/zhonglii/EMixNet.
Rethinking DA Learning Bound
Many existing UDA methods Wang and Breckon (2020); Zou et al. (2019); Tang and Jia (2020) learn a suitable feature transformation such that the discrepancy between transformed distributions and is reduced. By introducing the transformation in the classical DA learning bound, we discover that the combined error is not a fixed value.
Theorem 2.
Given a loss satisfying the triangle inequality, a transformation space and a hypothesis space , then for any and ,
where is the discrepancy distance defined in Eq. (2) and
(3) 
known as the combined risk.
According to Theorem 2, it is not enough to minimize the source risk and distribution discrepancy by seeking the optimal classifier and optimal transformation from spaces and . Because we cannot guarantee the value of combined risk is always small during the training process.
For convenience, we define
(4) 
hence, .
Meaning of Combined Risk
To future understand the meaning of the combined risk , we prove the following Theorem.
Theorem 3.
Theorem 3 implies that the combined risk is deeply related to the optimal classifier discrepancy
which can be regarded as the conditional distribution discrepancy between and . If increases, the conditional distribution discrepancy is larger.
Double Loss DA Learning Bound
Note that there exist methods, such as MDD Zhang et al. (2019a), whose source and target losses are different. To understand these UDA methods and bridge the gap between theory and algorithms, we develop the classical DA learning bound to a more general scenario.
Theorem 4.
Given losses and satisfying the triangle inequality, a transformation space and a hypothesis space , then for any and , then is bounded by
(5) 
where is the double loss disparity discrepancy defined in Eq. (1) and is the combined risk:
(6) 
(7) 
In Theorem 4, the condition that and satisfy the triangle inequality, can be replaced by a weaker condition:
If we set as the margin loss, do not satisfy the triangle inequality but they satisfy above condition.
Proposed Method: EMixNet
Here we introduce motivation and details of our method.
Motivation
Theorem 3 has shown that the combined risk is related to the conditional distribution discrepancy. As the increase of the combined risk, the conditional distribution discrepancy is increased. Hence, omitting the importance of the combined risk may make negative impacts on the targetdomain accuracy. Figure 1 (blue line) verifies our observation.
To control the combined risk, we consider the following problem.
(8) 
where is defined in Eq. (4). Eq. (8) shows we can control the combined risk by minimizing . However, it is prohibitive to directly optimize the combined risk, since the labeled target samples are indispensable to estimate .
To alleviate the above issue, a simple method is to use the target pseudo labels with high confidence to estimate the . Given the source samples and the target samples with high confidence , the empirical form of can be computed by
However, the combined risk may still increase as shown in Fig 1 (green line). The reason may be that the target samples, whose pseudo labels with high confidence, are insufficient.
Inspired by semisupervised learning, an advanced solution is to use
mixup technique Zhang et al. (2018) to augment pseudolabeled target samples. Mixup produces new samples by a convex combination: given any two samples , ,where is a hyperparameter. Zhang et al. Zhang et al. (2018) has shown that mixup not only reduces the memorization to adversarial samples, but also performs better than Empirical Risk Minimization (ERM) Vapnik and Chervonenkis (2015). By applying mixup on the target samples with high confidence, new samples are produced, then we propose a proxy of as follows:
The aforementioned issue can be mitigated, since mixup can be regarded as a data augmentation Zhang et al. (2018). However, the targetdomain pseudo labels provided by the sourcedomain classifier may be inaccurate due to the discrepancy between domains, which causes that mixup may not perform well with inaccurate labels. We propose enhanced mixup (emixup) to substitute the mixup to compute the proxy. Emixup introduces the pure truelabeled sourcesamples to mitigate the issue caused by bad pseudo labels.
Furthermore, to increase the diversity of new samples, emixup produces each new sample using two distant samples, where the distance of the two distant samples is expected to be large. Compared the ordinary mixup technique (i.e., producing new samples using randomly selected samples), emixup can produce new examples more effectively. We also verify that emixup can further boost the performance (see Table 5). Details of the emixup are shown in Algorithm 1. Corresponding to the double loss situation, denoted by samples produced by emixup, the proxy of (defined in Eq. (7)) as
(9) 
The purple line in Figure 1 and ablation study show that emixup can further boost performance.
Algorithm
The optimization of the combined risk plays a crucial role in UDA. Accordingly, we propose a method based on the aforementioned analyses to solve UDA more deeply.
Objective Function
According to the theoretical bound in Eq. (5), we need to solve the following problem
where is the double losses disparity discrepancy defined in Eq. (1) and is defined in Eq. (9).
Minimizing double loss disparity discrepancy is a minimax game, since the double losses disparity discrepancy is defined as the supremum over hypothesis space . Thus, we revise the above problem as follows:
(10) 
where are parameters to make our model more flexible,
(11) 
To solve the problem (10), we construct a deep method. The network architecture is shown in Fig. 2(b), which consists of a generator , a discriminator , and two classifiers . Next, we introduce the details about our method.
We use standard crossentropy as the source loss and use modified crossentropy Goodfellow et al. (2014); Zhang et al. (2019a) as the target loss .
For any scoring functions ,
(12) 
where is softmax function: for any ,
and
(13) 
here is the th coordinate function of function .
Source risk. Given the source samples , then
(14) 
where is the label corresponding to onehot vector .
Double loss disparity discrepancy. Given the source and target samples , then
(15) 
where are defined in Eq. (12).
Combined risk. As discussed in Motivation, the combined risk cannot be optimized directly. To mitigate this problem, we use the proxy in Eq. (9) to substitute it.
Further, motivated by Berthelot et al. (2019), we use mean mquare error () to calculate the proxy of the combined risk, because, unlike the crossentropy loss, is bounded and less sensitive to incorrect predictions. Denoted by the output of the emixup. Then the proxy is calculated by
(16) 
Method  AW  DW  WD  AD  DA  WA  Avg 

ResNet50 He et al. (2016)  68.40.2  96.70.1  99.30.1  68.90.2  62.50.3  60.70.3  76.1 
DAN Long et al. (2015)  80.50.4  97.10.2  99.60.1  78.60.2  63.60.3  62.80.2  80.4 
RTN Long et al. (2016)  84.50.2  96.80.1  99.40.1  77.50.3  66.20.2  64.80.3  81.6 
DANN Ganin et al. (2016)  82.00.4  96.90.2  99.10.1  79.70.4  68.20.4  67.40.5  82.2 
ADDA Tzeng et al. (2017)  86.20.5  96.20.3  98.40.3  77.80.3  69.50.4  68.90.5  82.9 
JAN Long et al. (2013)  86.00.4  96.70.3  99.70.1  85.10.4  69.20.3  70.70.5  84.6 
MADA Pei et al. (2018)  90.00.1  97.40.1  99.60.1  87.80.2  70.30.3  66.40.3  85.2 
SimNet Pinheiro (2018)  88.60.5  98.20.2  99.70.2  85.30.3  73.40.8  71.80.6  86.2 
MCD Saito et al. (2018)  89.60.2  98.50.1  100.0.0  91.30.2  69.60.1  70.80.3  86.6 
CDAN+E Long et al. (2018)  94.10.1  98.60.1  100.0.0  92.90.2  71.00.3  69.30.3  87.7 
SymNets Zhang et al. (2019b)  90.80.1  98.80.3  100.0.0  93.90.5  74.60.6  72.50.5  88.4 
MDD Zhang et al. (2019a)  94.50.3  98.40.1  100.0.0  93.50.2  74.60.3  72.20.1  88.9 
EMixNet  93.00.3  99.00.1  100.0.0  95.60.2  78.90.5  74.70.7  90.2 
Method  IP  PI  IC  CI  CP  PC  Avg 

ResNet50 He et al. (2016)  74.80.3  83.90.1  91.50.3  78.00.2  65.50.3  91.20.3  80.7 
DAN Long et al. (2015)  74.50.4  82.20.2  92.80.2  86.30.4  69.20.4  89.80.4  82.5 
DANN Ganin et al. (2016)  75.00.6  86.00.3  96.20.4  87.00.5  74.30.5  91.50.6  85.0 
JAN Long et al. (2013)  76.80.4  88.00.2  94.70.2  89.50.3  74.20.3  91.70.3  85.8 
MADA Pei et al. (2018)  75.00.3  87.90.2  96.00.3  88.80.3  75.20.2  92.20.3  85.8 
CDAN+E Long et al. (2018)  77.70.3  90.70.2  97.70.3  91.30.3  74.20.2  94.30.3  87.7 
SymNets Zhang et al. (2019b)  80.20.3  93.60.2  97.00.3  93.40.3  78.70.3  96.40.1  89.9 
EMixNet  80.50.4  96.00.1  97.70.3  95.20.4  79.90.2  97.00.3  91.0 
Training Procedure
Finally, the UDA problem can be solved by the following minimax game.
(17) 
The training procedure is shown in Algorithm 2.
Experiments
Method  AC  AP  AR  CA  CP  CR  PA  PC  PR  RA  RC  RP  Avg 

ResNet50 He et al. (2016)  34.9  50.0  58.0  37.4  41.9  46.2  38.5  31.2  60.4  53.9  41.2  59.9  46.1 
DAN Long et al. (2015)  54.0  68.6  75.9  56.4  66.0  67.9  57.1  50.3  74.7  68.8  55.8  80.6  64.7 
DANN Ganin et al. (2016)  44.1  66.5  74.6  57.9  62.0  67.2  55.7  40.9  73.5  67.5  47.9  77.7  61.3 
JAN Long et al. (2013)  45.9  61.2  68.9  50.4  59.7  61.0  45.8  43.4  70.3  63.9  52.4  76.8  58.3 
CDAN+E Long et al. (2018)  47.0  69.4  75.8  61.0  68.8  70.8  60.2  47.1  77.9  70.8  51.4  81.7  65.2 
SymNets Zhang et al. (2019b)  46.0  73.8  78.2  64.1  69.7  74.2  63.2  48.9  80.0  74.0  51.6  82.9  67.2 
MDD Zhang et al. (2019a)  54.9  73.7  77.8  60.0  71.4  71.8  61.2  53.6  78.1  72.5  60.2  82.3  68.1 
EMixNet  57.7  76.6  79.8  63.6  74.1  75.0  63.4  56.4  79.7  72.8  62.4  85.5  70.6 
Method  AC  AP  AR  CA  CP  CR  PA  PC  PR  RA  RC  RP  Avg 

DAN Long et al. (2015)  54.0  68.6  75.9  56.4  66.0  67.9  57.1  50.3  74.7  68.8  55.8  80.6  64.7 
DAN+  57.0  71.0  77.9  59.9  72.6  70.1  58.1  57.1  77.3  72.7  64.7  84.6  68.6 
DANN Ganin et al. (2016)  44.1  66.5  74.6  57.9  62.0  67.2  55.7  40.9  73.5  67.5  47.9  77.7  61.3 
DANN+  50.9  69.6  77.8  61.9  70.7  71.6  60.0  49.5  78.4  71.8  55.7  83.7  66.8 
CDAN+E Long et al. (2018)  47.0  69.4  75.8  61.0  68.8  70.8  60.2  47.1  77.9  70.8  51.4  81.7  65.2 
CDAN+E+  49.5  70.1  77.8  64.3  71.3  74.2  61.6  50.6  80.0  73.5  56.6  84.1  67.8 
SymNets Zhang et al. (2019b)  46.0  73.8  78.2  64.1  69.7  74.2  63.2  48.9  80.0  74.0  51.6  82.9  67.2 
SymNets+  48.8  74.7  79.7  64.9  72.5  75.6  63.9  47.0  80.8  73.9  52.4  83.9  68.2 
s  t  m  e  IP  PI  IC  CI  CP  PC  Avg 

80.2  94.2  96.7  94.7  79.2  95.5  90.1  
79.9  92.2  97.7  93.8  79.4  96.5  89.9  
79.7  93.7  97.5  94.5  79.7  96.2  90.2  
79.4  95.0  97.8  94.8  81.4  96.5  90.8  
80.5  96.0  97.7  95.2  79.9  97.0  91.0 
We evaluate EMixnet on three public datasets, and compare it with several existing stateoftheart methods. Codes will be available at https://github.com/zhonglii/EMixNet.
Datasets
Three common UDA datasets are used to evaluate the efficacy of EMixNet.
Office31 Saenko et al. (2010) is an object recognition dataset with images, which consists of three domains with a slight discrepancy: amazon (A), dslr (D) and webcam (W). Each domain contains kinds of objects. So there are domain adaptation tasks on Office31: A D, A W, D A, D W, W A, W D.
OfficeHome Venkateswara et al. (2017) is an object recognition dataset with image, which contains four domains with more obvious domain discrepancy than Office31. These domains are Artistic (A), Clipart (C), Product (P), RealWorld (R). Each domain contains kinds of objects. So there are domain adaptation tasks on OfficeHome: A C, A P, A R, …, R P.
ImageCLEFDA^{3}^{3}3http://imageclef.org/2014/adaptation/ is a dataset organized by selecting the 12 common classes shared by three public datasets (domains): Caltech256 (C), ImageNet ILSVRC 2012 (I), and Pascal VOC 2012 (P). We permute all three domains and build six transfer tasks: IP, PI, IC, CI, CP, PC.
Experimental Setup
Following the standard protocol for unsupervised domain adaptation in Ganin et al. (2016); Long et al. (2018), all labeled source samples and unlabeled target samples are used in the training process and we report the average classification accuracy based on three random experiments. in Eq. (15) is selected from 2, 4, 8, and it is set to 2 for OfficeHome, 4 for Office31, and 8 for ImageCLEF.
ResNet50 He et al. (2016)
pretrained on ImageNet is employed as the backbone network (
). , and are all two fully connected layers where the hidden unit is 1024. Gradient reversal layer between G andis employed for adversarial training. The algorithm is implemented by Pytorch. The minibatch stochastic gradient descent with momentum 0.9 is employed as the optimizer, and the learning rate is adjected by
, where i linearly increase from 0 to 1 during the training process, , . We follow Zhang et al. (2019a) to employ a progressive strategy for : , is set to 0.1. The in emixup is set to 0.6 in all experiments.Results
The results on Office31 are reported in Tabel 1. EMixNet achieves the best results and exceeds the baselines for 4 of 6 tasks. Compared to the competitive baseline MDD, EMixNet surpasses it by 4.3% for the difficult task D A.
The results on ImageCLEF are reported in Table 2. EMixNet significantly outperforms the baselines for 5 of 6 tasks. For the hard task C P, EMixNet surpasses the competitive baseline SymNets by 2.7%.
The results on OfficeHome are reported in Table 3. Despite OfficeHome is a challenging dataset, EMixNet still achieves better performance than all the baselines for 9 of 12 tasks. For the difficult tasks A C, P A, and R C, EMixNet has significant advantages.
In order to further verify the efficacy of the proposed proxy of the combined risk, we add the proxy into the loss functions of four representative UDA methods. As shown in Fig. 2(a), we add a new classifier that is the same as the classifier in the original method to formulate the proxy of the combined risk. The results are shown in Table 4. The four methods can achieve better performance after optimizing the proxy. It is worth noting that DANN obtains a 5.5% percent increase. The experiments adequately demonstrate the combined risk plays a crucial role for methods that aim to learn a domaininvariant representation and the proxy can indeed curb the increase of the combined risk.
Ablation Study and Parameter Analysis
Ablation Study. To further verify the efficacy of the proxy of combined risk calculated by mixup and emixup respectively. Ablation experiments are shown in Tabel 5, where s indicates that the source samples are introduced to augment the target samples, t indicates augmenting the target samples, m denotes mixup, and e denotes emixup. Table 5 shows that EMixNet achieves the best performance, which further shows that we can effectively control the combined risk by the proxy .
Parameter analysis. Here we aim to study how the parameter affects the performance and the efficiencies of mean square error (MSE) and crossentropy for the proxy of combined risk. Firstly, as shown in Fig. 3(a), a relatively larger can obtain better performance and faster convergence. Secondly, when mixup behaves between two samples, the accuracy of the pseudo labels of the target samples are much important. To against the adversarial samples, MSE is employed to substitute crossentropy. As shown in Fig. 3(b), MSE can obtain more stable and better performance. Furthermore, distance is also an important indicator showing the distribution discrepancy, which is defined as where is the test error. As shown in Fig. 3 (c). EMixNet achieves a better performance of adaptation, implying the efficiency of the proposed proxy.
Conclusion
Though numerous UDA methods have been proposed and have achieved significant success, the issue caused by combined risk has not been brought to the forefront and none of the proposed methods solve the problem. Theorem 3 reveals that the combined risk is deeply related to the conditional distribution discrepancy and plays a crucial role for transfer performance. Furthermore, we propose a method termed EMixNet, which employs enhanced mixup to calculate a proxy of the combined risk. Experiments show that our method achieves a comparable performance compared with existing stateoftheart methods and the performance of the four representative methods can be boosted by adding the proxy into their loss functions.
Acknowledgments
The work presented in this paper was supported by the Australian Research Council (ARC) under DP170101632 and FL190100149. The first author particularly thanks the support of UTSAAII during his visit.
References
 BenDavid et al. (2010) BenDavid, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; and Vaughan, J. W. 2010. A theory of learning from different domains. Machine learning 79(12): 151–175.
 BenDavid et al. (2007) BenDavid, S.; Blitzer, J.; Crammer, K.; and Pereira, F. 2007. Analysis of representations for domain adaptation. In NeurIPS, 137–144.
 Berthelot et al. (2019) Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; and Raffel, C. A. 2019. Mixmatch: A holistic approach to semisupervised learning. In NeurIPS, 5049–5059.
 Dong et al. (2019) Dong, J.; Cong, Y.; Sun, G.; and Hou, D. 2019. SemanticTransferable WeaklySupervised Endoscopic Lesions Segmentation. In ICCV, 10711–10720.
 Dong et al. (2020a) Dong, J.; Cong, Y.; Sun, G.; Liu, Y.; and Xu, X. 2020a. CSCL: Critical SemanticConsistent Learning for Unsupervised Domain Adaptation. In Vedaldi, A.; Bischof, H.; Brox, T.; and Frahm, J.M., eds., ECCV, 745–762. Cham: Springer International Publishing. ISBN 9783030585983.
 Dong et al. (2020b) Dong, J.; Cong, Y.; Sun, G.; Yang, Y.; Xu, X.; and Ding, Z. 2020b. WeaklySupervised CrossDomain Adaptation for Endoscopic Lesions Segmentation. IEEE Transactions on Circuits and Systems for Video Technology 1–1. doi:10.1109/TCSVT.2020.3016058.
 Dong et al. (2020c) Dong, J.; Cong, Y.; Sun, G.; Zhong, B.; and Xu, X. 2020c. What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation. In CVPR, 4022–4031.

Fang et al. (2020)
Fang, Z.; Lu, J.; Liu, F.; Xuan, J.; and Zhang, G. 2020.
Open set domain adaptation: Theoretical bound and algorithm.
IEEE Transactions on Neural Networks and Learning Systems
.  Fang et al. (2019) Fang, Z.; Lu, J.; Liu, F.; and Zhang, G. 2019. Unsupervised domain adaptation with sphere retracting transformation. In 2019 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.
 Ganin et al. (2016) Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; and Lempitsky, V. 2016. Domainadversarial training of neural networks. The Journal of Machine Learning Research 17: 2096–2030.
 Gong et al. (2016) Gong, M.; Zhang, K.; Liu, T.; Tao, D.; Glymour, C.; and Schölkopf, B. 2016. Domain adaptation with conditional transferable components. In ICML, 2839–2848.
 Goodfellow et al. (2014) Goodfellow, I.; PougetAbadie, J.; Mirza, M.; Xu, B.; WardeFarley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative Adversarial Nets. In NeurIPS, 2672–2680. Curran Associates, Inc.
 Guo, Pasunuru, and Bansal (2020) Guo, H.; Pasunuru, R.; and Bansal, M. 2020. MultiSource Domain Adaptation for Text Classification via DistanceNetBandits. In AAAI, 7830–7838.
 He et al. (2016) He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR, 770–778.
 Lee and Jha (2019) Lee, S.; and Jha, R. 2019. Zeroshot adaptive transfer for conversational language understanding. In AAAI, volume 33, 6642–6649.
 Liu et al. (2019) Liu, F.; Lu, J.; Han, B.; Niu, G.; Zhang, G.; and Sugiyama, M. 2019. Butterfly: A panacea for all difficulties in wildly unsupervised domain adaptation. arXiv preprint arXiv:1905.07720 .
 Long et al. (2015) Long, M.; Cao, Y.; Wang, J.; and Jordan, M. I. 2015. Learning transferable features with deep adaptation networks. In ICML, 97–105.
 Long et al. (2018) Long, M.; Cao, Z.; Wang, J.; and Jordan, M. I. 2018. Conditional adversarial domain adaptation. In NeurIPS, 1640–1650.
 Long et al. (2013) Long, M.; Wang, J.; Ding, G.; Sun, J.; and Yu, P. S. 2013. Transfer feature learning with joint distribution adaptation. In ICCV, 2200–2207.
 Long et al. (2016) Long, M.; Zhu, H.; Wang, J.; and Jordan, M. I. 2016. Unsupervised domain adaptation with residual transfer networks. In NeurIPS, 136–144.
 Lu et al. (2015) Lu, J.; Behbood, V.; Hao, P.; Zuo, H.; Xue, S.; and Zhang, G. 2015. Transfer learning using computational intelligence: A survey. KnowledgeBased Systems 80: 14–23.
 Lu et al. (2020) Lu, W.; Yu, Y.; Chang, Y.; Wang, Z.; Li, C.; and Yuan, B. 2020. A Dual Inputaware Factorization Machine for CTR Prediction. In Proceedings of the 29th International Joint Conference on Artificial Intelligence.
 Mansour, Mohri, and Rostamizadeh (2009) Mansour, Y.; Mohri, M.; and Rostamizadeh, A. 2009. Domain Adaptation: Learning Bounds and Algorithms. In COLT.
 Mohri and Medina (2012) Mohri, M.; and Medina, A. M. 2012. New analysis and algorithm for learning with drifting distributions. In ALT, 124–138. Springer.
 Pei et al. (2018) Pei, Z.; Cao, Z.; Long, M.; and Wang, J. 2018. Multiadversarial domain adaptation. arXiv preprint arXiv:1809.02176 .
 Pinheiro (2018) Pinheiro, P. O. 2018. Unsupervised domain adaptation with similarity learning. In CVPR, 8004–8013.
 Redko et al. (2020) Redko, I.; Morvant, E.; Habrard, A.; Sebban, M.; and Bennani, Y. 2020. A survey on domain adaptation theory. arXiv preprint arXiv:2004.11829 .
 Saenko et al. (2010) Saenko, K.; Kulis, B.; Fritz, M.; and Darrell, T. 2010. Adapting visual category models to new domains. In ECCV, 213–226. Springer.
 Saito et al. (2018) Saito, K.; Watanabe, K.; Ushiku, Y.; and Harada, T. 2018. Maximum classifier discrepancy for unsupervised domain adaptation. In CVPR, 3723–3732.
 Shen et al. (2018) Shen, J.; Qu, Y.; Zhang, W.; and Yu, Y. 2018. Wasserstein distance guided representation learning for domain adaptation. In AAAI.
 Tang and Jia (2020) Tang, H.; and Jia, K. 2020. Discriminative Adversarial Domain Adaptation. In AAAI, 5940–5947.
 Tzeng et al. (2017) Tzeng, E.; Hoffman, J.; Saenko, K.; and Darrell, T. 2017. Adversarial discriminative domain adaptation. In CVPR, 7167–7176.

Vapnik and Chervonenkis (2015)
Vapnik, V. N.; and Chervonenkis, A. Y. 2015.
On the uniform convergence of relative frequencies of events to their probabilities.
In Measures of complexity, 11–30. Springer.  Venkateswara et al. (2017) Venkateswara, H.; Eusebio, J.; Chakraborty, S.; and Panchanathan, S. 2017. Deep hashing network for unsupervised domain adaptation. In CVPR, 5018–5027.
 Wang and Breckon (2020) Wang, Q.; and Breckon, T. P. 2020. Unsupervised Domain Adaptation via Structured Prediction Based Selective PseudoLabeling. In AAAI, 6243–6250. AAAI Press.
 Xu et al. (2020) Xu, M.; Zhang, J.; Ni, B.; Li, T.; Wang, C.; Tian, Q.; and Zhang, W. 2020. Adversarial Domain Adaptation with Domain Mixup. In AAAI, 6502–6509. AAAI Press.
 Yu, Wang, and Yuan (2019) Yu, Y.; Wang, Z.; and Yuan, B. 2019. An Inputaware Factorization Machine for Sparse Prediction. In IJCAI, 1466–1472.
 Zhang et al. (2018) Zhang, H.; Cisse, M.; Dauphin, Y. N.; and LopezPaz, D. 2018. mixup: Beyond Empirical Risk Minimization. In ICLR.
 Zhang et al. (2020a) Zhang, K.; Gong, M.; Stojanov, P.; Huang, B.; Liu, Q.; and Glymour, C. 2020a. Domain Adaptation As a Problem of Inference on Graphical Models. In NeurIPS.
 Zhang et al. (2017) Zhang, Q.; Wu, D.; Lu, J.; Liu, F.; and Zhang, G. 2017. A crossdomain recommender system with consistent information transfer. Decision Support Systems 104: 49–63.
 Zhang et al. (2020b) Zhang, Y.; Deng, B.; Tang, H.; Zhang, L.; and Jia, K. 2020b. Unsupervised multiclass domain adaptation: Theory, algorithms, and practice. arXiv preprint arXiv:2002.08681 .
 Zhang et al. (2020c) Zhang, Y.; Liu, F.; Fang, Z.; Yuan, B.; Zhang, G.; and Lu, J. 2020c. Clarinet: A Onestep Approach Towards Budgetfriendly Unsupervised Domain Adaptation. arXiv preprint arXiv:2007.14612 .
 Zhang et al. (2019a) Zhang, Y.; Liu, T.; Long, M.; and Jordan, M. 2019a. Bridging Theory and Algorithm for Domain Adaptation. In Chaudhuri, K.; and Salakhutdinov, R., eds., ICML, volume 97 of PMLR, 7404–7413. PMLR.
 Zhang et al. (2019b) Zhang, Y.; Tang, H.; Jia, K.; and Tan, M. 2019b. Domainsymmetric networks for adversarial domain adaptation. In CVPR, 5031–5040.
 Zhao et al. (2019) Zhao, H.; des Combes, R. T.; Zhang, K.; and Gordon, G. 2019. On Learning Invariant Representation for Domain Adaptation. ICML .
 Zhong et al. (2020) Zhong, L.; Fang, Z.; Liu, F.; Yuan, B.; Zhang, G.; and Lu, J. 2020. Bridging the Theoretical Bound and Deep Algorithms for Open Set Domain Adaptation. arXiv preprint arXiv:2006.13022 .
 Zou et al. (2019) Zou, H.; Zhou, Y.; Yang, J.; Liu, H.; Das, H. P.; and Spanos, C. J. 2019. Consensus adversarial domain adaptation. In AAAI, volume 33, 5997–6004.
Comments
There are no comments yet.