1 Introduction
Many machinelearning algorithms assume that training and test data, typically in terms of featurelabel pairs, denoted as
, are drawn from the same featurelabel space with the same distribution, where is the feature while is the label of . However, this assumption rarely holds in practice as the data distribution is likely to change over time and space. Though stateoftheart deep convolutional features have shown invariant to lowlevel variations to some degree, they are still susceptible to domainshift, as it is expensive to manually label sufficient training data that cover diverse application domains. A typical solution is to further finetune a learned deep model on taskspecific datasets. However, it is still prohibitively difficult and expensive to obtain enough labeled data for finetuning on a big deep network. Instead of recollecting labeled data for every possible new task, unsupervised domainadaptation methods are adopted to alleviate performance degradations by transferring knowledge from related labeled source domains to an unlabeled target domain [Ganin et al.2016, Li et al.2017, Zhou et al.2019].When adopting domain adaptation, certain assumptions must be imposed on how distributions change across domains. For instance, most existing domain adaptation methods consider a covariate shift situation where the distributions on source and target domains only differ in the marginal featuredistribution , with an identical conditional distribution assumption. Here we use and
to denote random variables whose realizations are features
and labels , either from the source data or target data . In this setting, an early attempt is to match the feature distribution on source and target domains by importance reweighting [Huang et al.2007]. Stateoftheart approaches reduce domainshift by learning domaininvariant representations through deep neural transformations , parameterized by , such that . This is often achieved by optimizing a deep network to minimize some distributiondiscrepancy measures [Sun and Saenko2016, Tzeng et al.2017]. Because there is no targetdomain label in the unsupervised domain adaptation scenario, most existing methods simply assume by sharing a classifier learned with source labeled data only. However, this is typically not true in practice as the sourcelearned classifier tends to be biased toward the source. As shown in Figure 1 (a), though the featuredistributions are well matched, the classifiers may still perform poorly in the target domain due to labeldistribution mismatch.In this paper, we alleviate the above problem by proposing an approximate jointdistribution matching scheme. Specifically, due to the lack of label information in a target domain, we propose to match the model prediction uncertainty, a secondorder statistic equivalent, induced by the conditional distribution . We obtain the prediction uncertainty by imposing a Bayesian neural network (BNN) which induces posterior distributions over weights of a neural network. Without uncertainty matching, the BNN classifier is expected to produce high uncertainty for the unseen targetdomain data and low uncertainty for the sourcedomain data, due to the bias induced by training on the source domain. By contrast, with prediction uncertainty matching, one is able to achieve an approximate jointdistribution matching, alleviating domainshift on the classifier. The contributions of our work are summarized as follows:

Different from most existing domainadaptation methods, which only focus on reducing marginal featuredistribution discrepancy, we propose to match joint featurelabel distributions by exploiting model prediction uncertainty, effectively alleviating conditionaldistribution shift imposed by the classifier.

We employ BNNs to quantify prediction uncertainty. Through additional source and target uncertainty discrepancy minimization, both finegrained marginal featuredistribution and conditional labeldistribution matching are achieved.

Extensive experimental results on standard domainadaptation benchmarks demonstrate the effectiveness of the proposed method, outperforming current stateoftheart approaches.
2 Related Works
2.1 Domain Adaptation
Domain adaptation methods seek to learn discriminative features from neighbouring source domains to target domains. This is usually achieved by learning domaininvariant features [BenDavid et al.2010]. Previous methods usually seek to align source and target feature through subspace learning [Gong et al.2012]. Recently, deep adversarialdomainadaptation approaches have taken over and achieved stateoftheart performances [Hoffman et al.2018, Wen et al.2019]. These methods attempt to reduce domain discrepancy by optimizing deep networks with an adversarial objective produced by a discriminator network that is trained to distinguish features of target from source domains. Though significant marginal distributionshift can be reduced, these methods fail to fully address the conditional labeldistribution shift problem. There are some recent models trying to address this issue by utilizing pseudolabels [Long et al.2018, Chen et al.2018]. However, most of them are deterministic models, which can not essentially reduce the conditional domainshift, due to the unavailability of targetdomain labels.
2.2 Bayesian Uncertainty
Uncertainty can be achieved by adopting Bayesian neural networks. A typical BNN assigns a prior distribution, e.g., a Gaussian prior distribution, over the weights, instead of deterministic weights as in standard neural networks. Given observed data, approximate inference is performed to calculate posterior distribution of the weights, such as the methods in [Graves2011, Blundell et al.2015]. A more effective way to calculate Bayesian uncertainty is to employ the dropout variational inference [Gal and Ghahramani2016], which is adopted in this paper.
3 The Proposed Method
3.1 The Overall Idea
Given a labeled sourcedomain dataset and an unlabeled targetdomain dataset ), the goal of unsupervised domainadaptation is to learn an adapted model from the labeled sourcedomain data to the unlabeled targetdomain data. The source and target domains are assumed to be sampled from two joint distributions and , respectively, with . The joint distribution of featurelabel pairs can be decomposed as:
(1) 
Limitations of Traditional Methods.
Most existing domainadaptation methods reduce domainshift by learning a deep featuretransformation
such that , and a shared classifier network , parameterized by , using labeled source data . To adapt to a target domain, the learned is adopted to form the targetdomain joint distribution . It is easy to see that directly adopting in the targetdomain is unable to match the true joint distributions and , as only reflects featurelabel information in the source domain.3.1.1 Our Method
In this paper, we propose to jointly reduce the marginaldistribution shift () and conditionaldistribution shift () by exploiting prediction uncertainty. Specifically, our model consists of a probabilistic BNN feature extractor with inputs or , and a BNN classifier with inputs or . The classifier , which corresponds to the conditional distribution and is parameterized by , learns to classify samples from both domains.
As discussed in the Introduction, directly learning to match and
is unfeasible due to the unavailability of target labels. To overcome the difficulty, we instead learn to match the prediction uncertainty, a secondorder statistics equivalent. The intuition is that if the secondorder statistics of two distributions are matched, the two distributions will be brought closer. Another intuition is that, if target samples are not well matched with source samples in the feature space, these outliers are likely to be predicted with high uncertainty by a sourcetrained classifier. If one can quantify the uncertainty and minimize the crossdomain uncertainty discrepancy (source uncertainty is supposed to be low), the generator
will be encouraged to produce target features that best match the source both in the feature space and classifier prediction. In the following, we first introduce an effective way to obtain Bayesian uncertainty by adopting the dropout technique, and then describe the proposed framework of jointdistribution matching.3.2 Bayesian Uncertainty
We employ Bayesian neural network (BNN) to quantify model prediction uncertainty. BNN is a variant of standard neural networks by treating the weights as distributions, instead of using deterministic weights. However, it is often computationally inhibited to perform inference on the weight distributions in a largescale deep BNN. In this paper, we employ the practical dropout variational inference for approximate inference [Gal and Ghahramani2016] and efficient uncertainty approximation. In the proposed method, inference is done by training the model with dropout [Srivastava et al.2014]. In testing, dropout is also performed to generate approximate samples from the posterior distribution. This approach is equivalent to using a Bernoulli variational distribution [Gal and Ghahramani2016], parameterized by , to approximate the true model weights (W) posterior. As proven in [Gal and Ghahramani2016], the dropout inference essentially minimizes the KL divergence between the approximate distribution and the posterior of a deep Gaussian process. For classification, the objective can be formulated as:
(2) 
where is the number of training samples,
denotes the dropout probability,
is sampled according to the dropout variational distribution [Gal and Ghahramani2016], and is the set of the variational distribution’s parameters.The final prediction can be obtained by marginalizing over the approximate posterior distribution on weights, which is approximated using Monte Carlo integration as follows:
(3) 
with sampled masked weights, namely forwarding each sample through the feature extractor and classifier for
times with weights sampled according to the dropout inference. The uncertainty of the prediction can be summarized using different metrics. In this paper, we use two metrics: 1) entropy of the averaged probabilistic prediction, and 2) variance of all prediction vectors. The entropy and variance based prediction uncertainty are denoted as
and , respectively, formulated as:(4) 
(5) 
where denotes the information entropy function and the temperature of the , which controls the uncertainty level.
3.3 Distribution Adaptation
In this section, we describe how to simultaneously alleviate the marginal and conditional domainshift by matching the approximate joint distributions of the source and target featurelabel pairs.
3.3.1 JointDistribution Adaptation
We employ adversarial learning to match source and target statistics to reduce distribution discrepancy, as adversarial domainadaptation methods have achieved stateoftheart performances [Goodfellow et al.2014, Tzeng et al.2017]. Basically, the procedure is described by a twoplayer game. The first player, a domain discriminator D, is trained to distinguish source from target data; while the second player, the feature extractor , is trained to learn features that confuse the domain discriminator. By learning a best possible discriminator, the feature extractor is expected to learn features that are best domaininvariant. This learning procedure can be described by the following minimax game:
(6) 
where and are the number of training samples from source and target, respectively.
However, this typical adversarial minimax game for domain adaptation may be problematic in two aspects: 1) trivial feature alignment; and 2) unstable training. The domain discriminator fails to consider the relationship between learned features and the decision boundary of the classifier during feature alignment, which may lead to boundary target samples or trivial alignment with a hugecapacity [Shu et al.2018]. We aim to achieve nontrivial feature alignment by enforcing additional classifier prediction consistency during matching. Furthermore, noisy or hardtomatch samples may lead to unstable adversarial training. These confusing samples, which typically endow high prediction uncertainty, may produce unreliable gradients and deteriorate the training. They may also direct the to learn features that is nondiscriminative for classifying target samples, especially with a hugecapacity . Thus, we aim to attenuate the influence of noisy samples and reinforce the influence of easytomatch target samples by adaptively reweighting the adversarial loss. Specifically, we propose the following modified objective:
(7) 
where is the prediction uncertainty formulated in Equation (4) or Equation (5). Both and are the adaptation loss weights, defined as:
(8) 
where is the number of training samples and denotes the uncertainty threshold constraining the influence of samples with uncertainty larger than . For samples with uncertainty less than , the weights are normalized within each training batch with more attention paid on the certain samples. It is worth noting that we found directly using the uncertainty without normalization for the reweighting as done in [Kendall and Gal2017, Long et al.2018] tend to discourage a model from predicting low uncertainty for all samples. With such an adaptive jointdistribution adaptation objective, we aim to achieve nontrivial feature alignment and enable safer transfer.
3.3.2 ConditionalDistribution Adaptation
Note the jointdistributionmatching scheme described in the last section does not necessarily guarantee a good conditionaldistribution adaptation. In this section, we aim to reduce the conditional distribution shift and learn a domaininvariant classifier. Due to the infeasibility of directly minimizing the conditional distribution discrepancy , we propose to approximate it by matching prediction uncertainty, a secondorder statistic equivalent, with a BNN as the classifier. We exploit prediction uncertainty to detect and quantify domainshift of a classifier. By minimizing the uncertainty discrepancy between source and target, we aim to approximately reduce the domainshift of the classifier, and the objective can be formulated as :
(9) 
where we set as we found it achieves better performances than
. The prediction uncertainty discrepancy is estimated within each batch during training.
To enable discriminative feature transferring, the feature extractor and classifier are also trained to minimize the source supervised loss using source labels, defined as:
(10) 
where is the true label of the source sample and is the temperature for source classification.
Integrating all objectives together, the final learning procedure is formulated as:
(11) 
where and are hyperparameters that tradeoff the objectives in the unified optimization problem.
According to the analysis of [BenDavid et al.2010], the expected target error is upperbounded by the following three terms: 1) source error, 2) domain divergence, and 3) conditionaldistribution discrepancy across domains. We aim to improve marginal distribution matching to reduce the second term by minimizing to achieve joint featureuncertainty adaptation. While the third term is ignored by most of existing domain adaptation methods, we are able to reduce it via uncertainty matching and minimization.
4 Experiments
We compare our method with stateoftheart domainadaptation approaches on several benchmark datasets: USPSMNISTSVHN dataset [Hoffman et al.2018], Office31 dataset [Saenko et al.2010], and the recently introduced Officehome dataset [Venkateswara et al.2017].
UspsMnistSvhn.
This dataset is used for digits recognition with 3 domains: MNIST, USPS, and SVNH. MNIST is composed of grey images of size ; USPS contains grey digits; and SVHN consists of color digits images, which are more challenging and might contain more than one digit in each image. We evaluate our method using the three typical adaptation tasks: USPSMNIST (two tasks) and SVHNMNIST (one task). Following the same evaluation protocol of [Hoffman et al.2018], we use the standard training sets for domainadaptation training and report adaptation results on the test sets.
Office31.
This dataset is widely used for visual domain adaptation [Saenko et al.2010]. It consists of 4,652 images and 31 categories collected from three different domains: Amazon (A) from amazon.com, Webcam (W) and DSLR (D), taken by web camera and digital SLR camera in different environmental settings, respectively. We evaluate all methods on the following four challenging settings: AW and AD.
Officehome.
This is one of the most challenging visual domain adaptation datasets [Venkateswara et al.2017], which consists of 15,588 images with 65 categories of everyday objects in office and home settings. There are four significantly different domains: Art (Ar) consisting of 2427 painting, sketches or artistic depiction images, Clipart (Cl) containing 4365 images, Product (Pr) with 4439 images and RealWorld (Rw) comprising of 4357 regularly captured images. We report performances of all the 12 adaptation tasks to enable thorough evaluations: ArCl, ArPr, ArRw, ClPr, ClRw, and PrRw.
Compared Methods.
The stateoftheart deep domainadaptation methods we compared include: Domain Adversarial Neural Network (DANN) [Ganin et al.2016], Adversarial Discriminative Domain Adaptation (ADDA) [Tzeng et al.2017], Joint Adaptation Networks(JAN) [Long et al.2017], Conditional Domain Adversarial Network (CADN) [Long et al.2018], CycleConsistent Adversarial Domain Adaptation (CyCADA) [Hoffman et al.2018], Reweighted Adversarial Adaptation Network (RAAN) [Chen et al.2018], Local Feature Patterns for Domain Adaptation (LFPDA) [Wen et al.2019]. We follow standard evaluation protocols of unsupervised domain adaptation as in [Long et al.2017]. For our model, we report performances with uncertainty estimated with entropy and variance formulations, denoted as Our(Entro) and Our(Var), respectively.
4.1 Implementation Details
CNN Architectures.
For digit classification datasets, we use the same architecture as in ADDA [Tzeng et al.2017]. All digit images are resized to for fair comparisons.
On the Office31 and the Officehome
datasets, we finetune the AlexNet pretrained from the ImageNet. Following the DANN
[Ganin et al.2016], a bottleneck layer fcb with 256 units is added after the fc7 layer for adaptation. We adopt the same image random flipping and cropping strategy as in JAN [Long et al.2017].Method  SVHNMNIST  MNISTUSPS  USPSMNIST  Avg 

ADDA  
RAAN  
LFPDA  
CyCADA  
CDANM  96.5  
Ours(Var)  
Ours(Entro)  95.1 
Method  AW  AD  WA  DA  Avg 

AlexNet  
DANN  
ADDA  
LFPDA  
JAN  
CDANM  
Ours(Entro)  67.7 
Hyperparameters.
To enable stable training, we progressively increase the importance of the adaptation loss and set , where and denotes the training progress ranging from 0 to 1. We use a similar hyperparameter selection strategy as in DANN, called reverse validation. We set to ensure uncertainty reduction. With , we forward each sample times to obtain prediction uncertainty. We set , for adaptation loss reweighting, and for source classification loss. We dropout all fullyconnected layers with a dropout ration . Improvements are not observed with further dropout on convolution layers.
4.2 Results
The results on the digit recognition task are shown in Table 1. Our(Entro) achieves the best performances on most of the tasks. The CyCADA align features at both pixellevel and featurelevel. RAAN alleviates conditional distribution shift by matching label distributions. CADNM attempts to learn domaininvariant interactions between learned features and classifier through conditional adversarial learning. On these tasks, the plenty of source labels prevents the lowcapacity LeNetlike model from overfitting the source labels, thus the advantages of our method over DAAN and CADNM mainly come from uncertainty discrepancy minimization that alleviates the classifier bias.
Method  ArCl  ArPr  ArRw  ClAr  ClPr  ClRw  PrAr  PrCl  PrRw  RwAr  RwCl  RwPr  Avg 
AlexNet  
DANN  
JAN  
CDANM  39.7  35.5  63.2  
Ours(Entro)  40.3  51.6  61.5  58.0  58.6  45.9  50.1  50.9  71.7  51.8 
Our(Entro) consistently outperforms Our(Var). The distinct performance gap can be explained as follows. The entropy captures the crosscategory probability spread of the prediction while the variance measures the deviation of prediction probabilities around the mean. The entropy uncertainty is more sensitive to the multipeak probability spread across different categories. During training, the output probabilities of unmatched or boundary target samples usually cluster around two or more peaks, namely uncertain among several neighboring categories. In this case, the variance measure would obfuscate this multipeak information. In the following, we only report the performances of Our(Entro).
Method  AW  AD  WA  DA  Avg 

AlexNet  
DANN  
MADA  
Ours(Entro) 
Performances on the Office31 and Officehome datasets are reported in Table 2 and Table 3, respectively. Again, our model achieves the best performances on most of the tasks. Due to the smaller size of the labeled source dataset and the huge capacity of the AlexNet, the models easily overfit the source labels while being jointly trained to reduce the marginal distribution discrepancy. The overfitting harms the transferability of the aligned features, resulting in learning trivial features for the target domain. In this case, our model alleviates this problem by jointly enforcing feature alignment and classifier prediction consistency.
Negative Transfer.
Negative transfer happens when features are falsely aligned and domain adaptation causes deteriorated performances. Existing marginal distribution matching methods easily induce negative transfer when the marginal distributions between source and target are inherently different, e.g., the source domain is smaller or larger than the target. We conduct experiments on the Office31 dataset with the 3125 task by removing 6 classes from the target, and the 2525(+6) task by treating 6 extra target classes as noise images. We compare our method with DANN and MADA [Pei et al.2018] which is showed effective on alleviating negative transfer. The results are reported in Table 4. It is seen that DANN suffers obvious negative transfer on the 3125 task. The effectiveness of our method on alleviating negative transfer is significant. Adaptive joint featureuncertainty distribution matching encourages the model to mix source and target samples that best match with each other, thus alleviating the harmful effects of noisy samples.
Alignment Visualization.
We visualize the source and target learned representations on the USPSMNIST and D adaptation tasks using the tSNE embedding [Maaten and Hinton2008]. In Figure 3, we visualize features of nonadapted models, DANN and our adapted model. Compared with the nonadapted model, DANN significantly reduces marginal distribution shift. Our method effectively prevents generating unmatched target samples that lie close to the decision boundary of the classifier and tend to be incorrectly classified.
Convergence and Uncertainty.
In Figure 4, we show the convergence (test accuracy) and target uncertainty of the nonadapted model, DANN, and our model, on the USPSMNIST and AD tasks. As we can see, DANN adaptation effectively reduces target prediction uncertainty (source uncertainty is assured to be low) and improves target test accuracy. Our model further significantly reduces the discrepancy between source and target prediction uncertainty. The nearly synchronous increase of target accuracy and decrease of crossdomain prediction uncertainty discrepancy further indicates that uncertainty matching alleviates domainshift and improves domain adaptation.
5 Conclusions
We have proposed a novel and effective approach for jointdistribution matching by exploiting prediction uncertainty. To achieve this, we adopt a Bayesian neural network to model prediction uncertainty. Unlike most of existing deep domainadaptation methods that only reduce marginal featuredistribution shift, the proposed method additionally alleviates conditional distribution shift lingering in the last classifier. Experimental results verify the advantages of the proposed method over stateoftheart unsupervised domainadaptation approaches. More interestingly, we also have shown that the proposed method can effectively alleviate negative transfer in domain adaptation.
Acknowledgments
This work is supported by the Zhejiang Provincial Natural Science Foundation (LR19F020005), National Natural Science Foundation of China (61572433, 31471063, 31671074) and thanks for a gift grant from Baidu inc. Also partially supported by the Fundamental Research Funds for the Central Universities.
References
 [BenDavid et al.2010] Shai BenDavid, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine learning, 79(12):151–175, 2010.
 [Blundell et al.2015] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. In International Conference on Machine Learning, pages 1613–1622, 2015.

[Chen et al.2018]
Qingchao Chen, Yang Liu, Zhaowen Wang, Ian Wassell, and Kevin Chetty.
Reweighted adversarial adaptation network for unsupervised domain
adaptation.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 7976–7985, 2018. 
[Gal and Ghahramani2016]
Yarin Gal and Zoubin Ghahramani.
Dropout as a bayesian approximation: Representing model uncertainty in deep learning.
In international conference on machine learning, pages 1050–1059, 2016.  [Ganin et al.2016] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domainadversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
 [Gong et al.2012] Boqing Gong, Yuan Shi, Fei Sha, and Kristen Grauman. Geodesic flow kernel for unsupervised domain adaptation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2066–2073. IEEE, 2012.
 [Goodfellow et al.2014] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 [Graves2011] Alex Graves. Practical variational inference for neural networks. In Advances in neural information processing systems, pages 2348–2356, 2011.
 [Hoffman et al.2018] Judy Hoffman, Eric Tzeng, Taesung Park, JunYan Zhu, Phillip Isola, Kate Saenko, Alexei Efros, and Trevor Darrell. CyCADA: Cycleconsistent adversarial domain adaptation. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 1989–1998. PMLR, 2018.
 [Huang et al.2007] Jiayuan Huang, Arthur Gretton, Karsten M Borgwardt, Bernhard Schölkopf, and Alex J Smola. Correcting sample selection bias by unlabeled data. In Advances in neural information processing systems, pages 601–608, 2007.
 [Kendall and Gal2017] Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? In Advances in neural information processing systems, pages 5574–5584, 2017.
 [Li et al.2017] Zheng Li, Yun Zhang, Ying Wei, Yuxiang Wu, and Qiang Yang. Endtoend adversarial memory network for crossdomain sentiment classification. In IJCAI, pages 2237–2243, 2017.

[Long et al.2017]
Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan.
Deep transfer learning with joint adaptation networks.
In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 2208–2217. JMLR. org, 2017.  [Long et al.2018] Mingsheng Long, ZHANGJIE CAO, Jianmin Wang, and Michael I Jordan. Conditional adversarial domain adaptation. In Advances in Neural Information Processing Systems 31, pages 1647–1657. Curran Associates, Inc., 2018.
 [Maaten and Hinton2008] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using tsne. Journal of machine learning research, 9(Nov):2579–2605, 2008.

[Pei et al.2018]
Zhongyi Pei, Zhangjie Cao, Mingsheng Long, and Jianmin Wang.
Multiadversarial domain adaptation.
In
AAAI Conference on Artificial Intelligence
, pages 3934–3941, 2018.  [Saenko et al.2010] Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226. Springer, 2010.
 [Shu et al.2018] Rui Shu, Hung H Bui, Hirokazu Narui, and Stefano Ermon. A dirtt approach to unsupervised domain adaptation. In Proc. 6th International Conference on Learning Representations, 2018.
 [Srivastava et al.2014] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
 [Sun and Saenko2016] Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European Conference on Computer Vision, pages 443–450. Springer, 2016.
 [Tzeng et al.2017] Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7167–7176, 2017.
 [Venkateswara et al.2017] Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. Deep hashing network for unsupervised domain adaptation. In Proc. CVPR, pages 5018–5027, 2017.
 [Wen et al.2019] Jun Wen, Risheng Liu, Nenggan Zheng, Qian Zheng, Zhefeng Gong, and Junsong Yuan. Exploiting local feature patterns for unsupervised domain adaptation. In ThirtyThird AAAI Conference on Artificial Intelligence, 2019.
 [Zhou et al.2019] Joey Tianyi Zhou, Ivor W. Tsang, Sinno Jialin Pan, and Mingkui Tan. Multiclass heterogeneous domain adaptation. Journal of Machine Learning Research, 20(57):1–31, 2019.
Comments
There are no comments yet.