Discriminative Adversarial Domain Adaptation

11/27/2019 ∙ by Hui Tang, et al. ∙ South China University of Technology International Student Union 0

Given labeled instances on a source domain and unlabeled ones on a target domain, unsupervised domain adaptation aims to learn a task classifier that can well classify target instances. Recent advances rely on domain-adversarial training of deep networks to learn domain-invariant features. However, due to an issue of mode collapse induced by the separate design of task and domain classifiers, these methods are limited in aligning the joint distributions of feature and category across domains. To overcome it, we propose a novel adversarial learning method termed Discriminative Adversarial Domain Adaptation (DADA). Based on an integrated category and domain classifier, DADA has a novel adversarial objective that encourages a mutually inhibitory relation between category and domain predictions for any input instance. We show that under practical conditions, it defines a minimax game that can promote the joint distribution alignment. Except for the traditional closed set domain adaptation, we also extend DADA for extremely challenging problem settings of partial and open set domain adaptation. Experiments show the efficacy of our proposed methods and we achieve the new state of the art for all the three settings on benchmark datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Many machine learning tasks are advanced by large-scale learning of deep models, with image classification

[23] as one of the prominent examples. A key factor to achieve such advancements is the availability of massive labeled data on the domains of the tasks of interest. For many other tasks, however, training instances on the corresponding domains are either difficult to collect, or their labeling costs prohibitively. To address the scarcity of labeled data for these target tasks/domains, a general strategy is to leverage the massively available labeled data on related source ones via domain adaptation [19]. Even though the source and target tasks share the same label space (i.e. closed set domain adaptation), domain adaptation still suffers from the shift in data distributions. The main objective of domain adaptation is thus to learn domain-invariant features, so that task classifiers learned from the source data can be readily applied to the target domain. In this work, we focus on the unsupervised setting where training instances on the target domain are completely unlabeled.

Recent domain adaptation methods are largely built on modern deep architectures. They rely on great model capacities of these networks to learn hierarchical features that are empirically shown to be more transferable across domains [35, 38]. Among them, those based on domain-adversarial training [6, 33] achieve the current state of the art. Based on the seminal work of DANN [6], they typically augment a classification network with an additional domain classifier. The domain classifier takes features from the feature extractor of the classification network as inputs, which is trained to differentiate between instances from the two domains. By playing a minimax game [7], adversarial training aims to learn domain-invariant features.

Such domain-adversarial networks can largely reduce the domain discrepancy. However, the separate design of task and domain classifiers has the following shortcomings. Firstly, feature distributions can only be aligned to a certain level, since model capacity of the feature extractor could be large enough to compensate for the less aligned feature distributions. More importantly, given practical difficulties of aligning the source and target distributions with high granularity to the category level (especially for complex distributions with multi-mode structures), the task classifier obtained by minimizing the empirical source risk cannot well generalize to the target data due to an issue of mode collapse [12, 31], i.e., the joint distributions of feature and category are not well aligned across the source and target domains.

Recent methods [12, 31] take the first step to address the above shortcomings by jointly parameterizing the task and domain classifiers into an integrated one. To further push this line, based on such a classifier, we propose a novel adversarial learning method termed Discriminative Adversarial Domain Adaptation (DADA), which encourages a mutually inhibitory relation between its domain prediction and category prediction for any input instance, as illustrated in Figure 1

. This discriminative interaction between category and domain predictions underlies the ability of DADA to reduce domain discrepancy at both the feature and category levels. Intuitively, the adversarial training of DADA mainly conducts competition between the domain neuron (output) and the true category neuron (output). Different from the work

[31] whose mechanism to align the joint distributions is rather implicit, DADA enables explicit alignment between the joint distributions, thus improving the classification of target data. Except for closed set domain adaptation, we also extend DADA for partial domain adaptation [4], i.e. the target label space is subsumed by the source one, and open set domain adaptation [27], i.e. the source label space is subsumed by the target one. Our main contributions can be summarized as follows.

  • We propose in this work a novel adversarial learning method, termed DADA, for closed set domain adaptation. Based on an integrated category and domain classifier, DADA has a novel adversarial objective that encourages a mutually inhibitory relation between category and domain predictions for any input instance, which can promote the joint distribution alignment across domains.

  • For more realistic partial domain adaptation, we extend DADA by a reliable category-level weighting mechanism, termed DADA-P, which can significantly reduce the negative influence of outlier source instances.

  • For more challenging open set domain adaptation, we extend DADA by balancing the joint distribution alignment in the shared label space with the classification of outlier target instances, termed DADA-O.

  • Experiments show the efficacy of our proposed methods and we achieve the new state of the art for all the three adaptation settings on benchmark datasets.

Figure 1: (Best viewed in color.) Discriminative Adversarial Domain Adaptation (DADA), which includes a feature extractor and an integrated category and domain classifier . The blue and orange colors denote and , and the losses applied to them, respectively. Note that DADA explicitly establishes a discriminative interaction between category and domain predictions. Please refer to the main text for how the adversarial training objective of DADA is defined.

Related Works

Closed Set Domain Adaptation After the seminal work of DANN [6], ADDA [32] proposes an untied weight sharing strategy to align the target feature distribution to a fixed source one. SimNet [22] replaces the standard FC-based cross-entropy classifier by a similarity-based one. MADA [20] and CDAN [16] integrate the discriminative category information into domain-adversarial training. VADA [30] reduces the cluster assumption violation to constrain domain-adversarial training. Some methods [33, 34] focus on transferable regions to learn domain-invariant features and task classifier. TAT [14] enhances the discriminability of features to guarantee the adaptability. Some methods [26, 25, 13] utilize category predictions from two task classifiers to measure the domain discrepancy. The most related works [12, 31] to us propose joint parameterization of the task and domain classifiers, which implicitly align the joint distributions. Differently, our proposed DADA makes the joint distribution alignment more explicit, thus promoting classification on the target domain.

Partial Domain Adaptation The work [36] weights each source instance by its importance to the target domain based on one domain classifier, and then trains another domain classifier on target and weighted source instances. The works [3, 4] reduce the contribution of outlier source instances to the task or domain classifiers by utilizing category predictions. Differently, DADA-P weights the proposed source discriminative adversarial loss by a reliable category confidence.

Open Set Domain Adaptation Previous research [10] proposes to reject an instance as the unknown category by threshold filtering. The work [27]

proposes to utilize adversarial training for both domain adaptation and unknown outlier detection. Differently, DADA-O balances the joint distribution alignment in the shared label space with the outlier rejection.

Method

Given of labeled instances sampled from the source domain , and of unlabeled instances sampled from the target domain , the objective of unsupervised domain adaptation is to learn a feature extractor and a task classifier such that the expected target risk

is low for a certain classification loss function

. The domains and are assumed to have different distributions. To achieve a low target risk, a typical strategy is to learn and by minimizing the sum of the source risk and some notion of distance between the source and target domain distributions, inspired by domain adaptation theories [2, 1]. This strategy is based on a simple rational that the source risk would become a good indicator of the target risk when the distance between the two distributions is getting closer. While most of existing methods use distance measures based on the marginal distributions, it is arguably better to use those based on the joint distributions.

The above strategy is generally implemented by domain-adversarial learning [6, 33], where separate task classifier and domain classifier are typically stacked on top of the feature extractor . As discussed before, this type of design has the following shortcomings: (1) model capacity of could be large enough to make and hardly differentiable for any instance, even though the marginal feature distributions are not well aligned; (2) more importantly, it is difficult to align the source and target distributions with high granularity to the category level (especially for complex distributions with multi-mode structures), and thus obtained by minimizing the empirical source risk cannot perfectly generalize to the target data due to an issue of mode collapse, i.e. the joint distributions are not well aligned.

To alleviate the above shortcomings, inspired by semi-supervised learning methods based on GANs

[28, 5], the recent work [31] proposes joint parameterization of and into an integrated one . Suppose the classification task of interest has categories, is formed simply by augmenting the last FC layer of with one additional neuron.

Denote

as the output vector of class probabilities of

for an instance , and , , as its element. The element of the conditional probability vector is written as follows

(1)

For ease of subsequent notations, we also write and . Then, such a network is trained by the classification-aware adversarial learning objective

(2)

where balances category classification and domain adversarial losses. The mechanism of this objective to align the joint distributions across domains is rather implicit.

To make it more explicit, based on the integrated classifier , we propose a novel adversarial learning method termed Discriminative Adversarial Domain Adaptation (DADA), which explicitly enables a discriminative interplay of predictions among the domain and categories for any input instance, as illustrated in Figure 1. This discriminative interaction underlies the ability of DADA to promote the joint distribution alignment, as explained shortly.

Discriminative Adversarial Learning

To establish a direct interaction between category and domain predictions, we propose a novel source discriminative adversarial loss that is tailored to the design of the integrated classifier . The proposed loss is inspired by the principle of binary cross-entropy loss. It is written as

(3)

Intuitively, the proposed loss (3) establishes a mutually inhibitory relation between of the prediction on the true category of , and of the prediction on the domain of . We first discuss how the proposed loss (3) works during adversarial training, and we show that under practical conditions, minimizing (3) over the classifier has the effects of discriminating among task categories while distinguishing the source domain from the target one, and maximizing (3) over the feature extractor can discriminatively align the source domain to the target one.

Discussion We first write the gradient formulas of on any source instance w.r.t. and as

Since both and are among the output probabilities of the classifier , we always have and , suggesting . When the loss (3) is minimized over

via stochastic gradient descent (SGD), we have the update

where is the learning rate, and since , increases; when it is maximized over via stochastic gradient ascent (SGA), we have the update , and since , decreases. Then, we discuss the change of in two cases: (1) in case of that guarantees , when minimizing the loss (3) over by SGD update , we have decreased , and when maximizing it over by SGA update , we have increased ; (2) in case of that guarantees , when minimizing the loss (3) over by SGD update, we have increased , and when maximizing it over by SGA update, we have decreased , as shown in Figure 2.

Figure 2: Changes of and when minimizing and maximizing the loss (3) in the two cases.

For discriminative adversarial domain adaptation, we expect that (1) when minimizing the proposed loss (3) over , task categories of the source domain is discriminative and the source domain is distinctive from the target one, which can be achieved when increases and decreases; (2) when maximizing it over , the source domain is aligned to the target one while retains discriminability, which can be achieved when decreases and increases in the case of . To meet the expectations, the condition of for all source instances should be always satisfied. This is practically achieved by pre-training DADA on the labeled source data using a -way cross-entropy loss, and maintaining in the adversarial training of DADA the same supervision signal. We present in the supplemental material empirical evidence on benchmark datasets that shows the efficacy of our used scheme.

To achieve the joint distribution alignment, the explicit interplay between category and domain predictions for any target instance should also be created. Motivated by recent works [20, 16] which alleviate the issue of mode collapse by aligning each instance to several most related categories, we propose a target discriminative adversarial loss based on the design of the integrated classifier , by using the conditional category probabilities to weight the domain predictions. It is written as

(4)

where the element of the domain prediction vector for the category is written as follows

(5)

An intuitive explanation for our proposed (4) is provided in the supplemental material.

Established knowledge from cluster analysis

[18]

indicates that we can estimate clusters with a low probability of error only if the conditional entropy is small. To this end, we adopt the entropy minimization principle

[8], which is written as

(6)

where computes the entropy of a probability vector. Combining (3), (4), and (6) gives the following minimax problem of our proposed DADA

(7)

where is a hyper-parameter that trade-offs the adversarial domain adaptation objective with the entropy minimization one in the unified optimization problem. Note that in the minimization problem of (7), serves as a regularizer for learning to avoid the trivial solution (i.e. all instances are assigned to the same category), and in the maximization problem of (7), it helps learn more target-discriminative features, which can alleviate the negative effect of adversarial feature adaptation on the adaptability [14].

By optimizing (7), the joint distribution alignment can be enhanced. This ability comes from the better use of discriminative information from both the source and target domains. Concretely, DADA constrains the domain classifier so that it clearly/explicitly knows the classification boundary, thus reducing false alignment between different categories. By deceiving such a strong domain classifier, DADA can learn a feature extractor that better aligns the two domains. We also theoretically prove in the supplemental material that DADA can better bound the expected target error.

Extension for Partial Domain Adaptation

Partial domain adaptation is a more realistic setting, where the target label space is subsumed by the source one. The false alignment between the outlier source categories and the target domain is unavoidable. To address it, existing methods [3, 36, 4] utilize the category or domain predictions, to decrease the contribution of source outliers to the training of task or domain classifiers. Inspired by these ideas, we extend DADA for partial domain adaptation by using a reliable category-level weighting mechanism, which is termed DADA-P.

Concretely, we average the conditional probability vectors over all target data and then normalize the averaged vector by dividing its largest element. The category weight vector with as its element is derived by a convex combination of the normalized vector and an all-ones vector , as follows

(8)

where is to suppress the detection noise of outlier source categories in the early stage of training. Then, we apply the category weight vector to the proposed discriminative adversarial loss for any source instance, leading to

(9)

Since predicted probabilities on the outlier source categories are more likely to increase when minimizing over , which incurs negative transfer. To avoid it, we minimize over and the objective of DADA-P is

(10)

By optimizing it, DADA-P can simultaneously alleviate negative transfer and promote the joint distribution alignment across domains in the shared label space.

Extension for Open Set Domain Adaptation

Open set domain adaptation is a very challenging setting, where the source label space is subsumed by the target one. We denominate the shared category and all unshared categories between the two domains as the “known category” and “unknown category” respectively. The goal of open set domain adaptation is to correctly classify any target instance as the known or unknown category. The false alignment between the known and unknown categories is inevitable. To this end, the work [27] proposes to make a pseudo decision boundary for the unknown category, which enables the feature extractor to reject some target instances as outliers. Inspired by this work, we extend DADA for open set domain adaptation by training the classifier to classify all target instances as the unknown category with a small probability , which is termed DADA-O. Assuming the predicted probability on the unknown category as the element of , i.e., , the modified target adversarial loss when minimized over the integrated classifier is

(11)

where . When maximized over the feature extractor , we still use the discriminative loss in (4). Replacing in (7) with (11) gives the overall adversarial objective of DADA-O, which can achieve a balance between domain adaptation and outlier rejection.

We utilize all target instances to obtain the concept of “unknown”, which is very helpful for the classification of unknown target instances as the unknown category but can cause the misclassification of known target instances as the unknown category. This issue can be alleviated by selecting an appropriate . If is too small, the unknown target instances cannot be correctly classified; if is too large, the known target instances can be misclassified. By choosing an appropriate , the feature extractor can separate the unknown target instances from the known ones while aligning the joint distributions in the shared label space.

Experiments

Datasets and Implementation Details

Office-31 [24] is a popular benchmark domain adaptation dataset consisting of images of categories collected from three domains: Amazon (A), Webcam (W), and DSLR (D). We evaluate on six settings.

Syn2Real [21] is the largest benchmark. Syn2Real-C has over images of shared categories in the combined training, validation, and testing domains. The images on the training domain are synthetic ones by rendering 3D models. The validation and test domains comprise real images, and the validation one has images. We use the training domain as the source domain and validation one as the target domain. For partial domain adaptation, we choose images of the first categories (in alphabetical order) in the validation domain as the target domain and form the setting: Synthetic 12 Real 6. For open set domain adaptation, we evaluate on Syn2Real-O, which includes two domains. The training/synthetic domain uses synthetic images from the categories of Syn2Real-C as “known”. The validation/real domain uses images of the categories from the validation domain of Syn2Real-C as “known”, and k images from other categories as “unknown”. We use the training and validation domains of Syn2Real-O as the source and target domains respectively.

Implementation Details We follow standard evaluation protocols for unsupervised domain adaptation [6, 33]: we use all labeled source and all unlabeled target instances as the training data. For all tasks of Office-31 and Synthetic 12 Real 6, based on ResNet-50 [9], we report the classification result on the target domain of mean(standard deviation) over three random trials. For other tasks of Syn2Real, we evaluate the accuracy of each category based on ResNet-101 and ResNet-152 (for closed and open set domain adaptation respectively). For each base network, we use all its layers up to the second last one as the feature extractor , and set the neuron number of its last FC layer as to have the integrated classifier . Exceptionally, we follow the work [21]

and replace the last FC layer of ResNet-152 with three FC layers of 512 neurons. All base networks are pre-trained on ImageNet

[23]. We firstly pre-train them on the labeled source data, and then fine-tune them on both the labeled source data and unlabeled target data via adversarial training, where we maintain the same supervision signal as the pre-training.

We follow DANN [6] to use the SGD training schedule: the learning rate is adjusted by , where denotes the process of training iterations that is normalized to be in , and we set , , and ; the hyper-parameter is initialized at and is gradually increased to by , where we set . We empirically set . We implement all our methods by PyTorch. The code will be available at https://github.com/huitangtang/DADA-AAAI2020.

Methods A W D W W D A D D A W A Avg
No Adaptation 79.90.3 96.80.4 99.50.1 84.10.4 64.50.3 66.40.4 81.9

DANN
81.20.3 98.00.2 99.80.0 83.30.3 66.80.3 66.10.3 82.5

DANN-CA
85.40.4 98.20.2 99.80.0 87.10.4 68.50.2 67.60.3 84.4
DADA (w/o em + w/o td) 91.00.2 98.70.1 100.00.0 90.80.2 70.90.3 70.20.3 86.9

DADA (w/o em)
91.80.1 99.00.1 100.00.0 92.50.3 72.80.2 72.30.3 88.1

DADA
92.30.1 99.20.1 100.00.0 93.90.2 74.40.1 74.20.1 89.0

Table 1: Ablation studies using Office-31 based on ResNet-50. Please refer to the main text for how they are defined.
Methods A W D W W D A D D A W A Avg
No Adaptation [9] 79.90.3 96.80.4 99.50.1 84.10.4 64.50.3 66.40.4 81.9

DAN [15]
81.30.3 97.20.0 99.80.0 83.10.2 66.30.0 66.30.1 82.3

DANN [6]
81.20.3 98.00.2 99.80.0 83.30.3 66.80.3 66.10.3 82.5

ADDA [32]
86.20.5 96.20.3 98.40.3 77.80.3 69.50.4 68.90.5 82.9

MADA [20]
90.00.1 97.40.1 99.60.1 87.80.2 70.30.3 66.40.3 85.2

VADA [30]
86.50.5 98.20.4 99.70.2 86.70.4 70.10.4 70.50.4 85.4

DANN-CA [31]
91.35 98.24 99.48 89.94 69.63 68.76 86.2

GTA [29]
89.50.5 97.90.3 99.80.4 87.70.5 72.80.3 71.40.4 86.5

MCD [26]
88.60.2 98.50.1 100.00.0 92.20.2 69.50.1 69.70.3 86.5

CDAN+E [16]
94.10.1 98.60.1 100.00.0 92.90.2 71.00.3 69.30.3 87.7

TADA [33]
94.30.3 98.70.1 99.80.2 91.60.3 72.90.2 73.00.3 88.4

SymNets [37]
90.80.1 98.80.3 100.00.0 93.90.5 74.60.6 72.50.5 88.4

TAT [14]
92.50.3 99.30.1 100.00.0 93.20.2 73.10.3 72.10.3 88.4
DADA 92.30.1 99.20.1 100.00.0 93.90.2 74.40.1 74.20.1 89.0
Table 2: Results for closed set domain adaptation on Office-31 based on ResNet-50. Note that SimNet is implemented by an unknown framework; MADA and DANN-CA are implemented by Caffe; all the other methods are implemented by PyTorch.

Analysis

Ablation Study We conduct ablation studies on Office-31 to investigate the effects of key components of our proposed DADA based on ResNet-50. Our ablation studies start with the very baseline termed “No Adaptation” that simply fine-tunes a ResNet-50 on the source data. To validate the mutually inhibitory relation enabled by DADA, we use DANN [6] and DANN-CA [31] respectively as the second and third baselines. To investigate how the entropy minimization principle helps learn more target-discriminative features, we remove the entropy minimization loss (6) from our main minimax problem (7), denoted as “DADA (w/o em)”. To know effects of the proposed source and target discriminative adversarial losses (3) and (4), we remove both (6) and (4) from (7), denoted as “DADA (w/o em + w/o td)”.

Figure 3: Average probability on the true category over all target instances by task classifiers of different methods.

Results in Table 1 show that although DANN improves over “No Adaptation”, its result is much worse than DANN-CA, verifying the efficacy of the design of the integrated classifier . “DADA (w/o em + w/o td)” improves over DANN-CA and “DADA (w/o em)” improves over “DADA (w/o em + w/o td)”, showing the efficacy of our proposed discriminative adversarial learning. DADA significantly outperforms DANN and DANN-CA, confirming the efficacy of the proposed mutually inhibitory relation between the category and domain predictions in aligning the joint distributions of feature and category across domains. Table 1 also confirms that entropy minimization is helpful to learn more target-discriminative features.

Methods plane bcycl bus car horse knife mcycl person plant sktbrd train truck mean

No Adaptation [9]
55.1 53.3 61.9 59.1 80.6 17.9 79.7 31.2 81.0 26.5 73.5 8.5 52.4

DANN [6]
81.9 77.7 82.8 44.3 81.2 29.5 65.1 28.6 51.9 54.6 82.8 7.8 57.4

DAN [15]
87.1 63.0 76.5 42.0 90.3 42.9 85.9 53.1 49.7 36.3 85.8 20.7 61.1

MCD [26]
87.0 60.9 83.7 64.0 88.9 79.6 84.7 76.9 88.6 40.3 83.0 25.8 71.9

GPDA [11]
83.0 74.3 80.4 66.0 87.6 75.3 83.8 73.1 90.1 57.3 80.2 37.9 73.3

ADR [25]
87.8 79.5 83.7 65.3 92.3 61.8 88.9 73.2 87.8 60.0 85.5 32.3 74.8

DADA
92.9 74.2 82.5 65.0 90.9 93.8 87.2 74.2 89.9 71.5 86.5 48.7 79.8
Table 3: Results for closed set domain adaptation on Syn2Real-C based on ResNet-101. Note that all compared methods are based on PyTorch implementation.
Known-to-Unknown Ratio
Methods plane bcycl bus car horse knife mcycl person plant sktbrd train truck unk Known Mean

No Adaptation [9]
49 20 29 47 62 27 79 3 37 19 70 1 62 36 38

DAN [15]
51 40 42 56 68 24 75 2 39 30 71 2 75 41 44

DANN [6]
59 41 16 54 77 18 88 4 44 32 68 4 61 42 43

AODA [27]
85 71 65 53 83 10 79 36 73 56 79 32 87 60 62
DADA-O 88 76 76 64 79 46 91 62 52 63 86 8 55 66 65
Known-to-Unknown Ratio
AODA [27] 80 63 59 63 83 12 89 5 61 14 79 0 69 51 52
DADA-O 77 63 75 71 38 33 92 58 47 50 89 1 50 58 57
Table 4: Results for open set domain adaptation on Syn2Real-O based on ResNet-152. Known indicates the mean classification result over the known categories whereas Mean also includes the unknown category. The table below shows the results when the Known-to-Unknown Ratio in the target domain is set to . All compared methods are based on PyTorch implementation.

Quantitative Comparison To compare the efficacy of different methods in reducing domain discrepancy at the category level, we visualize the average probability on the true category over all target instances by task classifiers of No Adaptation, DANN, DANN-CA, and DADA on A W in Figure 3. Note that here we use labels of the target data for the quantization of category-level domain discrepancy. Figure 3 shows that our proposed DADA gives the predicted probability on the true category of any target instance a better chance to approach , meaning that target instances are more likely to be correctly classified by DADA, i.e., a better category-level domain alignment.

Methods Synthetic 12Real 6
No Adaptation [9] 45.26

DAN [15]
47.60

DANN [6]
51.01

RTN [17]
50.04

PADA [4]
53.53
DADA-P 69.06
Table 5: Results for partial domain adaptation on Syn2Real-C based on ResNet-50. Note that all compared methods are based on PyTorch implementation.

Results

Closed Set Domain Adaptation We compare in Tables 2 and 3 our proposed method with existing ones on Office-31 and Syn2Real-C based on ResNet-50 and ResNet-101 respectively. Whenever available, results of existing methods are quoted from their respective papers or the recent works [20, 16, 14, 26]. Our proposed DADA outperforms existing methods, testifying the efficacy of DADA in aligning the joint distributions of feature and category across domains.

Partial Domain Adaptation We compare in Table 5 our proposed method to existing ones on Syn2Real-C based on ResNet-50. Results of existing methods are quoted from the work [4]. Our proposed DADA-P substantially outperforms all comparative methods by , showing the effectiveness of DADA-P on reducing the negative influence of source outliers while promoting the joint distribution alignment in the shared label space.

Open Set Domain Adaptation We compare in Table 4 our proposed method with existing ones on Syn2Real-O based on ResNet-152. Results of existing methods are quoted from the recent work [21]

. Our proposed DADA-O outperforms all comparative methods in both evaluation metrics of Known and Mean, showing the efficacy of DADA-O in both aligning joint distributions of the known instances and identifying the unknown target instances. It is noteworthy that DADA-O improves over the state-of-the-art method AODA by a large margin when the known-to-unknown ratio in the target domain is much smaller than

, i.e. the false alignment between the known source and unknown target instances will be much more serious. This observation confirms the efficacy of DADA-O.

We provide more results and analysis for the three problem settings in the supplemental material.

Conclusion

We propose a novel adversarial learning method termed Discriminative Adversarial Domain Adaptation (DADA) to overcome the limitation in aligning the joint distributions of feature and category across domains, which is due to an issue of mode collapse induced by the separate design of task and domain classifiers. Based on an integrated task and domain classifier, DADA has a novel adversarial objective that encourages a mutually inhibitory relation between the category and domain predictions, which can promote the joint distribution alignment. Unlike previous methods, DADA explicitly enables a discriminative interaction between category and domain predictions. Except for closed set domain adaptation, we also extend DADA for more challenging problem settings of partial and open set domain adaptation. Experiments on benchmark datasets testify the efficacy of our proposed methods for all the three settings.

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China (Grant No.: 61771201), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No.: 2017ZT07X183), and the Guangdong R&D key project of China (Grant No.: 2019B010155001).

References

  • [1] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan (2010-05-01) A theory of learning from different domains. Machine Learning 79 (1), pp. 151–175. External Links: ISSN 1573-0565, Document Cited by: Method.
  • [2] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira (2007) Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems, B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.), pp. 137–144. Cited by: Method.
  • [3] Z. Cao, M. Long, J. Wang, and M. I. Jordan (2018)

    Partial transfer learning with selective adversarial networks

    .
    In Computer Vision and Pattern Recognition, Cited by: Related Works, Extension for Partial Domain Adaptation.
  • [4] Z. Cao, L. Ma, M. Long, and J. Wang (2018) Partial adversarial domain adaptation. In European Conference on Computer Vision, Cited by: Introduction, Related Works, Extension for Partial Domain Adaptation, Results, Table 5.
  • [5] Z. Dai, Z. Yang, F. Yang, W. W. Cohen, and R. R. Salakhutdinov (2017) Good semi-supervised learning that requires a bad gan. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 6510–6520. Cited by: Method.
  • [6] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky (2016)

    Domain-adversarial training of neural networks

    .
    J. Mach. Learn. Res. 17 (1), pp. 2096–2030. External Links: ISSN 1532-4435 Cited by: Introduction, Related Works, Method, Datasets and Implementation Details, Datasets and Implementation Details, Analysis, Table 2, Table 3, Table 4, Table 5.
  • [7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 2672–2680. Cited by: Introduction.
  • [8] Y. Grandvalet and Y. Bengio (2005) Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou (Eds.), pp. 529–536. Cited by: Discriminative Adversarial Learning.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Computer Vision and Pattern Recognition, Cited by: Datasets and Implementation Details, Table 2, Table 3, Table 4, Table 5.
  • [10] L. P. Jain, W. J. Scheirer, and T. E. Boult (2014) Multi-class open set recognition using probability of inclusion. In European Conference on Computer Vision, Cited by: Related Works.
  • [11] M. Kim, P. Sahu, B. Gholami, and V. Pavlovic (2019) Unsupervised visual domain adaptation: A deep max-margin gaussian process approach. In Computer Vision and Pattern Recognition, Cited by: Table 3.
  • [12] V. K. Kurmi and V. P. Namboodiri (2019) Looking back at labels: a class based domain adaptation technique. ArXiv abs/1904.01341. Cited by: Introduction, Introduction, Related Works.
  • [13] C. Lee, T. Batra, M. H. Baig, and D. Ulbricht (2019) Sliced wasserstein discrepancy for unsupervised domain adaptation. In Computer Vision and Pattern Recognition, Cited by: Related Works.
  • [14] H. Liu, M. Long, J. Wang, and M. Jordan (2019) Transferable adversarial training: a general approach to adapting deep classifiers. In International Conference on Machine Learning, Cited by: Related Works, Discriminative Adversarial Learning, Results, Table 2.
  • [15] M. Long, Y. Cao, Z. Cao, J. Wang, and M. I. Jordan (2018) Transferable representation learning with deep adaptation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1. External Links: Document, ISSN 0162-8828 Cited by: Table 2, Table 3, Table 4, Table 5.
  • [16] M. Long, Z. CAO, J. Wang, and M. I. Jordan (2018) Conditional adversarial domain adaptation. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 1640–1650. Cited by: Related Works, Discriminative Adversarial Learning, Results, Table 2.
  • [17] M. Long, H. Zhu, J. Wang, and M. I. Jordan (2016) Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, Cited by: Table 5.
  • [18] R. F. Nalewajski (2012) Elements of information theory. In Perspectives in Electronic Structure Theory, pp. 371–395. External Links: ISBN 978-3-642-20180-6, Document Cited by: Discriminative Adversarial Learning.
  • [19] S. J. Pan and Q. Yang (2010) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, pp. 1345–1359. Cited by: Introduction.
  • [20] Z. Pei, Z. Cao, M. Long, and J. Wang (2018) Multi-adversarial domain adaptation. In

    Association for the Advancement of Artificial Intelligence

    ,
    Cited by: Related Works, Discriminative Adversarial Learning, Results, Table 2.
  • [21] X. Peng, B. Usman, K. Saito, N. Kaushik, J. Hoffman, and K. Saenko (2018) Syn2Real: a new benchmark forsynthetic-to-real visual domain adaptation. ArXiv abs/1806.09755. Cited by: Datasets and Implementation Details, Datasets and Implementation Details, Results.
  • [22] P. O. Pinheiro (2018) Unsupervised domain adaptation with similarity learning. In Computer Vision and Pattern Recognition, Cited by: Related Works.
  • [23] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115 (3), pp. 211–252. Cited by: Introduction, Datasets and Implementation Details.
  • [24] K. Saenko, B. Kulis, M. Fritz, and T. Darrell (2010) Adapting visual category models to new domains. In European Conference on Computer Vision, Cited by: Datasets and Implementation Details.
  • [25] K. Saito, Y. Ushiku, T. Harada, and K. Saenko (2018) Adversarial dropout regularization. In International Conference on Learning Representations, Cited by: Related Works, Table 3.
  • [26] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada (2018) Maximum classifier discrepancy for unsupervised domain adaptation. In Computer Vision and Pattern Recognition, Cited by: Related Works, Results, Table 2, Table 3.
  • [27] K. Saito, S. Yamamoto, Y. Ushiku, and T. Harada (2018)

    Open set domain adaptation by backpropagation

    .
    In European Conference on Computer Vision, Cited by: Introduction, Related Works, Extension for Open Set Domain Adaptation, Table 4.
  • [28] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, and X. Chen (2016) Improved techniques for training gans. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 2234–2242. Cited by: Method.
  • [29] S. Sankaranarayanan, Y. Balaji, C. D. Castillo, and R. Chellappa (2018) Generate to adapt: aligning domains using generative adversarial networks. In Computer Vision and Pattern Recognition, Cited by: Table 2.
  • [30] R. Shu, H. Bui, H. Narui, and S. Ermon (2018) A DIRT-t approach to unsupervised domain adaptation. In International Conference on Learning Representations, Cited by: Related Works, Table 2.
  • [31] L. Tran, K. Sohn, X. Yu, X. Liu, and M. K. Chandraker (2019) Gotta adapt ’em all: joint pixel and feature-level domain adaptation for recognition in the wild. In Computer Vision and Pattern Recognition, Cited by: Introduction, Introduction, Related Works, Method, Analysis, Table 2.
  • [32] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell (2017) Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition, Cited by: Related Works, Table 2.
  • [33] X. Wang, L. Li, W. Ye, M. Long, and J. Wang (2019) Transferable attention for domain adaptation. In Association for the Advancement of Artificial Intelligence, Cited by: Introduction, Related Works, Method, Datasets and Implementation Details, Table 2.
  • [34] J. Wen, R. Liu, N. Zheng, Q. Zheng, Z. Gong, and J. Yuan (2019) Exploiting local feature patterns for unsupervised domain adaptation. In Association for the Advancement of Artificial Intelligence, Cited by: Related Works.
  • [35] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson (2014) How transferable are features in deep neural networks?. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 3320–3328. Cited by: Introduction.
  • [36] J. Zhang, Z. Ding, W. Li, and P. Ogunbona (2018) Importance weighted adversarial nets for partial domain adaptation. In Computer Vision and Pattern Recognition, Cited by: Related Works, Extension for Partial Domain Adaptation.
  • [37] Y. Zhang, H. Tang, K. Jia, and M. Tan (2019) Domain-symmetric networks for adversarial domain adaptation. In Computer Vision and Pattern Recognition, Cited by: Table 2.
  • [38] Y. Zhang, H. Tang, and K. Jia (2018) Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data. In The European Conference on Computer Vision, Cited by: Introduction.