Mining Label Distribution Drift in Unsupervised Domain Adaptation

06/16/2020 ∙ by Peizhao Li, et al. ∙ Brandeis University Indiana University 0

Unsupervised domain adaptation targets to transfer task knowledge from labeled source domain to related yet unlabeled target domain, and is catching extensive interests from academic and industrial areas. Although tremendous efforts along this direction have been made to minimize the domain divergence, unfortunately, most of existing methods only manage part of the picture by aligning feature representations from different domains. Beyond the discrepancy in feature space, the gap between known source label and unknown target label distribution, recognized as label distribution drift, is another crucial factor raising domain divergence, and has not been paid enough attention and well explored. From this point, in this paper, we first experimentally reveal how label distribution drift brings negative effects on current domain adaptation methods. Next, we propose Label distribution Matching Domain Adversarial Network (LMDAN) to handle data distribution shift and label distribution drift jointly. In LMDAN, label distribution drift problem is addressed by the proposed source samples weighting strategy, which select samples to contribute to positive adaptation and avoid negative effects brought by the mismatched in label distribution. Finally, different from general domain adaptation experiments, we modify domain adaptation datasets to create the considerable label distribution drift between source and target domain. Numerical results and empirical model analysis show that LMDAN delivers superior performance compared to other state-of-the-art domain adaptation methods under such scenarios.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1:

Domain adaptation with and w/o label distribution drift. Source and target domain differ in the color of borders of circles. Circles with different colors inside denote different categories, and the size indicates the number of samples within that category. Straight lines denote decision boundary learned by the classifier. Adaptation under label distribution drift makes feature misaligned in categorical level and decision boundary unapplied to target domain.

Domain adaptation is a fundamental research topic in the machine learning and transfer learning field, and continually draws attention from academic and industrial communities 

[27, 2]

. It aims to build models on labeled source data and related target data, then make models adapt and generalize on target domain. Progress along this direction can be served for many downstream tasks, including image-to-image translation 

[36], image segmentation [24], arrhythmia detection [13], and so on. Different settings for domain adaptation are applicable for complicated real-world problems [18, 20, 28, 9]. Among these settings, unsupervised domain adaptation, containing no label but only samples in target domain, is a challenging but practical one, owing to actual scenes are suffering from label-scarcity.

The mitigation toward domain shift, which aims to reduce the domain divergence between source and target, is the primary solution for unsupervised domain adaptation problems. Existing methods [14, 21, 12, 23, 33, 22, 34] mainly focus on alleviating negative effects brought by domain shift in feature representations. They reduce the discrepancy by pushing feature distribution from two separate domains close to each other. Consequently, models are expected to be generalized favorably to a related target data distribution. Adversarial learning is recently introduced into domain adaptation with promising performance [12, 32, 22, 19]. By executing generated features to confuse the discriminator, meanwhile forming the discriminator to distinguish from source to target features, adversarial training aligns features through the min-max game and delivers domain-invariant feature representations.

However, existing methods mainly consider feature level alignment, which is not enough to guarantee the success of a positive adaptation. As another component of domain shift, the disparity in distributions of labels between source and target domain, i.e., the number of samples in each category differs from source to target, is known as label distribution drift in corresponding to the shift in the distribution of samples across domains. As presented in Figure 1, label distribution drift brings negative effects from two aspects. First, features belong to a large-size category in target domain are expected to approach features in mismatched categories in source domain due to the imbalanced adaptation toward label distribution. As a result, the alignment corrupts feature representations of those misaligned samples. Second, decision boundary of the classifier is only trained on labeled source samples, and is not applicable to target domain when label distributions differ significantly. These two inside reasons make the adaptation power down under label distribution drift scenario. As the complements, empirical evidence in Section 3 also demonstrates that huge label distribution drift between two sets of data brings a significant drop into the capacity of adaptation, and even leads models to perform worse than none-transfer methods. More challenging, different from sample distribution shift between domains, label distribution cannot be aligned directly by existing methods because of the unknown target label distribution, and it is more difficult to address this problem when a considerable label distribution drift exists.

In this paper, we move a further step toward huge label distribution drift in the unsupervised domain adaptation setting, and manage data shift together with label distribution drift in a unified framework. As mentioned before, domain adaptation with only feature space alignment is not enough. Therefore we try to align two domains on the premise that corresponding label distributions are roughly matched, and continually alleviate data shift and label distribution drift simultaneously during training. To this end, we propose the Label distribution Matching Domain Adversarial Network (LMDAN). To be specific, we propose a novel weighting strategy for source samples re-weighting, which mining samples that contribute to positive adaptation while mitigating negative effects comes from aligning irrelevant classes across domains. The proposed weighting function contributes to both the adversarial feature alignment and classified boundary learning, hence addresses the two-fold negative impacts brought by huge label distribution drift simultaneously. In summary, we highlight major contributions of this paper in three folds as follow.

  • We experimentally investigate the negative impact brought by label distribution drift under current state-of-the-art domain adaptation methods, which, to our best knowledge, has not been well explored.

  • We propose a label matching strategy to re-weight source samples, that enables the source label distribution match with the unknown target one during adapted process, and addresses the two-fold negative impacts brought by huge label distribution drift simultaneously.

  • Different from experiments in previous literature, we evaluate the proposed method on three benchmark datasets with manual modifications to simulate considerable label distribution drift, and LMDAN achieves leading performance compared to state-of-the-art domain adaptation methods. Additionally, we provide comprehensive analyses for the proposed method.

2 Related Work

According to domain divergence, we introduce related works on unsupervised domain adaptation in terms of feature space and label space alignment, then highlight the differences between existing works and the proposed method.

Feature alignment aims to reduce the domain divergence in feature space. Traditional methods construct projections for two domains, mapping two feature distributions into the manifold space or subspace to address the domain shift problem [14, 11, 10, 8]. Recently, Long et al. [21, 23] use deep models to reduce the discrepancy between feature spaces in multiply layer levels. Further, by the success of Generative Adversarial Network [15], adversarial learning in deep models for unsupervised domain adaptation are continually deliver favorable performance [22, 5, 34]. Ganin et al. [12] are the first to employ adversarial learning based domain adaptation model and paves to many following works. Tzeng et al. [32] uses two separate encoders for adaptation while decomposing the transfer process from an end-to-end fashion. Besides, the incorporation of conditional distribution [22, 6] in adaptation is also a promising way to reach domain-invariant representations. Optimal transport for domain adaptation [7, 30, 3, 1] is another interesting line of research, where source samples are mapped into target domain with minimal cost transportation. Although great efforts have been made to seek better feature alignment, only handling the feature divergence is not enough to guarantee an adaptation without any negative transfer.

Label distribution drift is another problem in domain adaptation but with less exploration. The problem results from the divergence between known source label and unknown target label distribution. Liang et al. [13] focus on the negative transfer and imbalanced distributions in multi-source transfer learning, while Ming et al. [26] exploit label and structural information within and across domains based on maximum mean discrepancy. Similarly, Yan et al. [33]

introduce class-specific auxiliary weights into original maximum mean discrepancy to exploit the class prior probability in source and target domain. Moreover, Chen 

et al. [4] employ a simple re-weighting function in Earth-Mover distance reduction. Although some pioneering works have touched the label distribution drift problem, they only explore some preliminary scenarios and leave cases with a vast gap in label distribution across domains.

Different from existing studies, We tackle the domain adaptation problem under considerable label distribution drift situations and conduct a comprehensive cognition on label distribution drift from both experimentally and methodological perspectives. Based on these, we propose a novel label matching strategy by continually seeking for samples that are benefit to positive adaptation and simultaneously prevent from the negative transfer.

3 Motivation

In this section, we firstly illustrate the label distribution drift problem in terms of theoretical and practical perspectives, and further present our problem formulation.

3.1 Notation

We start from the basic notations. Consider taking labeled samples under domain distribution , where and denotes the corresponding input and label space, and unlabeled samples with the same label set as source samples taken from target domain distribution . The goal of unsupervised domain adaptation is to utilize labeled source data for unlabeled target samples predictions. Suppose a encoder that is designed for projecting samples drawn i.i.d. from and to a shared feature representation space.

3.2 Label Distribution Drift

Figure 2: Performance of DANN under varying degrees of label distribution drift on Office-31 dataset from source domain Amazon to target domain Webcam. The black line indicates training with the original label distribution, while red and blue lines denote sample drop rates by 50% and 75%, respectively. Solid lines indicate that dropped samples come from the first 15 classes for both source and target domain, while dashed lines indicate dropped samples come from the first 15 classes and last 16 classes in source and target domain, respectively. The legend provides the measurement of KL-divergence between source and target label distribution.

Tremendous efforts have been made to explore solutions for unsupervised domain adaptation. Unfortunately, most existing studies only focus on feature space divergence (denotes as ), that is, minimizing , while ignore the negative effects brought by label distribution drift.

From the theoretical perspective, a generalization bound for domain adaptation problem towards the expected error on target samples [2] is given as follows:

(1)

where and are expected errors on target and source domain, respectively; is hypothesis space, is the optimal joint risk among source and target samples, and is a constant related to the numbers of samples, dimensions, confidence level, and VC-dimension of . With the assumption that source and target label distribution are close enough, methods with only feature alignment could achieve small target error by reducing domain distance term . However, as pointed out by Zhao et al. [35], when the above assumption does not hold, the huge label distribution gap between two domains leads the joint error term increase oppositely during the optimization towards domain distance term, and might counteract with the reduction in domain distance term, which even increases the value of the upper bound.

The above perspective is supported by practical evidence. Figure 2 shows the performance of the classical domain adversarial method DANN [12] on varying degrees of label distribution drift on Office-31 dataset [31]. Solid and dashed lines represent slight and huge label distribution drift, while all experiments are with the same size of training data. Two observations are quite clear: (1) Compared to the black line training on original dataset, the solid red and blue lines deliver similar results, indicating that even if size of classes in the same domain are imbalanced, high performance is still achieved by DANN under the scenario of similar source and target label distribution. Although training sets of solid lines contain different samples, dropped samples do not bring negative effects into the performance; (2) The gap between solid and dashed lines indicates that when there is a significant label distribution drift between two domains, the performance drops dramatically. The adaptation performance of DANN becomes much worse with a larger divergence between source and target label distribution. More experimental results reveal the negative effects from label distribution drift can be referred in Section 5.

Based on the above theoretical analysis and practical evidence, only realizing the alignment on feature distribution is still far away from the success of the adaptation. The above exploration motivates us to provide the unified problem formulations of unsupervised domain adaptation on both data distribution shift and label distribution drift.

3.3 Problem Formulation

Two facing challenges for unsupervised domain adaptation problems brought by domain divergence are as follows.

Data Distribution Shift. Usually samples and extracted feature distributions from source to target domain are different, i.e., , and prohibits models to learn a classifier with labeled source samples that can directly applied for target sample predictions. For this reason, feature alignment can be achieved by minimizing the divergence

or by minimization conditional distribution divergence

to narrow the data distribution shift for domain adaptation.

Label Distribution Drift. Beyond the inconsistency in feature space, domain divergence also occurs in label space, where . It is more challenging to handle label distribution drift than data shift due to is an agnostic distribution in unsupervised domain adaptation.

When considerable label distribution divergence exists, the excessive optimization towards feature distribution alignment will lead to the minimization of

which aligns target representations to irrelevant classes in source domain during training. These are enable to corrupt categorical feature representations and rise increasing predicted error on predicting in target domain, and further deliver negative effects to the adapted process.

Most unsupervised domain adaptation methods consider domain divergence merely from data shift, and take no drift in label space for granted. It is not only far away from real scenarios but also suffers from the degraded performance due to label distribution drift (See Figure 2). Consider negative effects can be easily accessed when the inconsistent label distribution and the alignment between irrelevant classes across domains exist, not all samples can be used during the training process for positive adaptation only comes from the part of correctly matched pair of source-target samples. Consequently, we try to exploit and emphasize the part of correctly matched samples in two domains, while mitigating the alignment on class-mismatched samples, thus further increase the ratio of positive adaptation and avoid the negative transfer brought by the alignment between irrelevant categories across source and target domains. This can be view as an unsupervised sample selection that we are continually seeking for samples that benefit to adaptation and avoid negative transfer concurrently during training.

Figure 3: The framework of Label distribution Matching Domain Adversarial Network (LMDAN). Adversarial feature alignment aims to obtain domain-invariant features and to reduce data shift, while the matching in label distribution exploit class-wise weights. These weights dually contribute to decision boundary adaptation and feature alignment.

4 Methodology

We start this section by elaborating on the proposed LMDAN framework, then provide a detailed description for the label distribution matching and source sample weighting strategy, followed by overall objective function and the corresponding optimization solution for LMDAN.

4.1 Overview

Figure 3 shows the framework of the proposed LMDAN. It minimizes the domain divergence embedded in feature space on the premise of close source-target label distribution. Specifically, to align the source and target domain under label distribution drift, LMDAN contains two interactive parts: adversarial training that for domain-invariant features generation, and the class-wise re-weighting strategy through the optimal assignment for source sample selection. In adversarial feature alignment, the encoder tries to extract feature and from two domains and confuse the discriminator , while tries to distinguish and from each other. Finally,

is trained to map data distribution from two domains close enough. In source samples weighting part, by adding class-wise weights on both adversarial training and supervision on the classifier, we manipulate feature alignment in adversarial training and decision boundary of the classifier to tackle the label drift scenario simultaneously. The dual weighting strategy makes the network adapt to target domain by two sides: (1) The weighting for min-max game emphasize features in the same category get closer across domains, at the same time mitigate the misalignment, and (2) The weighting for classifier makes decision boundary adapt to the target label distribution. In the following, we emphatically illustrate the source sample weighting strategy, then provide details for adversarial training and overall loss functions.

4.2 Label Distribution Matching

Label distribution matching is one of the crucial components in the LMDAN framework. It disposes of label distribution drift towards source-target sample matching. Here, we expect to exploit samples in the parts of classes matched across source and target domain by the optimal assignment, then enlarge matched classes and shrink the size of less relevant classes in source domain. As a result, samples in source domain engage in the adversarial feature alignment are able to approach to target domain in terms of the label distribution, and further contribute to increase positive and mitigate negative transfer during the training process.

To achieve this, we employ the classified probability with of every sample to measure the degree of matching. Based on the measurement of distance and optimal matching, mismatched pairs result in a larger distance, while matched pairs perform inversely. Consider a cost function and and the classified probabilities obtained by the classifier for source sample and target sample , respectively, and the output space . Based on optimal assignment [17]

, we seek for a joint probability distribution

according to and :

(2)

This indicates the optimal assignment based on classified probabilities from source to target with the least cost.

As for the discrete version for implementation, we employ euclidean distance to build the cost matrix between source and target domain,

(3)

and other distance functions can be used as well. Based on , the optimal assignment is written as:

(4)

where indicates Frobenius inner product, and

is vector of ones with

-dimension.

Input: and
Output:
while not converge do
        Build the min-batch set {, , };
        Compute by Eq. (4), (5), &(6);
        Optimize , and by Eq. (7);
       
end while
Predict target label by = .
Algorithm 1 LMDAN

We then incorporate the distance within classified probabilities into the optimal assignment plan and make the conjunct term guide class-wise weights for each class. We obtain the weight guiding matrix by

(5)

where denotes the Hadamard product. By matching classification probabilities with the minimal cost, weight guiding matrix provides guidance for misaligned samples. Moreover, following the above step, we compute the class-wise weight for class with index in source domain by:

(6)

where is the indicator function. Note that consists of two parts, where the first term manages the imbalanced class size within source domain itself, and the second awards or punishes the matched or mismatched pairs between source and target accordingly. is the parameter to control the influence of source class imbalanced scale.

Using weights in terms of categories according to the optimal matching toward classified probabilities, we are able to distinguish classes that misaligned and less relevant to positive transfer from well-aligned ones during training. By re-weighting samples in source domain by class-wise weights, the size or corresponding categories is enlarged or shrunk accordingly, then further push the source label distribution to the unknown target one dynamically.

Method A W A D W A W D D A D W Average
ResNet-50 [16] 66.1 4.3 65.8 1.5 53.3 3.1 87.8 2.9 53.0 3.7 79.4 2.6 67.6 1.5
DANN [12] 50.7 2.6 54.0 2.7 35.4 3.4 62.6 4.2 34.6 3.8 56.3 2.9 49.0 0.8
JAN [23] 51.2 3.2 49.5 2.4 46.1 3.9 72.9 4.1 40.9 5.1 71.8 2.6 55.4 1.6
WMMD [33] 39.1 5.2 43.3 4.1 38.4 2.7 67.8 4.8 34.1 3.2 68.1 7.1 48.5 3.4
CDAN [22] 65.7 3.2 62.8 4.8 52.5 2.7 78.1 4.7 39.8 4.5 73.5 4.4 62.1 1.7
RAAN [4] 59.4 3.8 65.7 2.9 48.5 5.0 76.4 3.5 45.8 6.9 77.4 3.6 62.2 3.2
SymNets [34] 57.1 4.0 54.6 2.7 41.9 6.3 67.0 5.1 32.4 4.8 57.2 6.7 51.7 2.7
BSP [5] 61.5 2.1 58.9 2.6 47.5 3.2 85.0 3.6 40.4 2.9 84.1 3.0 62.9 2.2
LMDAN 73.1 1.7 71.0 2.5 56.5 2.4 84.4 2.6 57.8 4.9 88.8 3.5 71.9 2.1
Table 1: Results for unsupervised domain adaptation under label distribution drift on Office-31[0.75;0.75] dataset

4.3 Objective Function and Solution

Finally, we provide objective functions and the corresponding optimize solution for LMDAN. We firstly calculate class-wise weights on each mini-batch samples and then optimize toward min-max game in adversarial learning integrates with subsequent classification by a dual weighting strategy. Loss functions for LMDAN can be written as:

(7)

where is the corresponding weight of the class where belongs to, and

is the trade-off hyperparameter for classification loss and adversarial loss. In our objective functions, the weighting strategy conducts in two places. The weighted classifier

captures label distribution drift for better decision boundary adaptation on target domain, and the weighted discriminator and encoder further adjust feature alignment to fit label distribution drift as well.

Algorithm 1 shows the complete optimization procedure. In our implementation, we utilize cross-entropy loss as the loss function for , and set the trade-off parameter default to 1 for all experiments. Since the complexity of the optimal assignment is not scalable to the whole dataset, the mini-batch label matching is work as well. Two benefits are clear. Mini-batch training makes the complexity of the optimal matching affordable in big data adaptation. Besides, equivalent numbers of data points from source and target domain can be sampled, rendering the matching and feature alignment balanced. We use pre-trained ResNet-50 [16] as the feature extractor. Following by [12], we set the initial learning rate for SGD optimizer, then gradually adjust the learning rate for the classifier by , where is the training process changed from to linearly. The learning rate for discriminator is and adjust accordingly.

5 Experimental Analysis

We firstly clarify the dataset details and corresponding modification for experiments, then evaluate the performance of the proposed LMDAN by comparing with other state-of-the-art unsupervised domain adaptation methods under the huge label distribution drift scenario, and followed by comprehensive analyses for LMDAN.

5.1 Dataset and Modification

Three widely-used real-world datasets are employed to evaluate the performance of LMDAN and other competitive methods. (1) Office-31 [31] contains 4,652 images in total within 31 categories. The dataset contains three domains: Amazon (A), Webcam (W), and DSLR (D), where images are broadly taken from internet to real scenarios. (2) Visda-2017 [29] is a challenging domain adaptation dataset, aiming to transfer knowledge from synthetic images (S) to real images (R). It contains around 152,000 synthetic images of 3D models for source domain, and 72,000 real images for target domain. Both two domains contain 12 categories. (3) ImageCLEF-DA 111https://www.imageclef.org/2014/adaptation contains 600 images per domain taken from three objects recognition datasets, Caltech-256 (C), ImageNet ILSVRC 2012 (I), and Pascal VOC 2012 (P). In following experiments, ‘‘AW" denotes domain adaptation from source domain Amazon to target domain Webcam.

Figure 4: Label distribution of 31 categories on original Office-31 and modified Office-31[0.75;0.75] dataset.

Different from experiments in previous domain adaptation literature, we simulate label distribution drift by randomly drop out 75% samples in the first half of classes within source domain, and 75% samples in latter half of classes in target domain, and note this modified dataset as ‘‘NAME[0.75;0.75]’’, where NAME is the name of original datasets. Figure 4 shows the label distribution on original Office-31 and modified Office-31[0.75;0.75], respectively. The samples dropping process with randomness is repeated five times, and we conduct experiments on all created datasets while reporting the average performance and its fluctuation to alleviate the sample selection bias.

Method C I C P I C I P P C P I Average
ResNet-50 [16] 76.9 3.2 63.8 1.7 87.1 1.8 71.3 0.7 81.7 3.7 73.4 4.2 75.7 2.1
DANN [12] 47.4 2.8 40.8 2.7 55.0 1.2 50.4 2.3 55.0 3.1 51.2 3.6 50.0 1.4
JAN [23] 34.2 2.8 27.9 1.0 38.8 3.8 49.0 3.4 36.7 4.4 44.1 3.0 38.5 0.7
WMMD [33] 42.4 1.1 30.4 3.5 65.2 3.9 70.8 3.0 47.2 3.5 56.4 1.9 52.0 2.9
CDAN [22] 58.1 3.8 52.2 3.2 76.3 4.0 62.7 1.8 66.2 9.5 59.2 1.2 63.1 2.2
RAAN [4] 62.9 1.3 54.6 3.3 78.3 1.7 63.6 3.6 71.0 6.6 65.4 2.4 66.0 2.3
SymNets [34] 59.2 5.0 53.8 3.2 70.5 3.9 57.2 3.8 63.4 7.6 54.3 1.5 59.7 1.4
BSP [5] 52.6 1.7 43.4 2.5 70.5 2.9 58.6 4.6 67.0 4.3 62.9 1.8 59.2 1.8
LMDAN 79.1 2.8 67.7 2.7 89.8 2.3 71.6 2.8 88.1 2.4 80.5 1.0 79.5 0.8
Table 2: Results for unsupervised domain adaptation with label distribution drift on ImageCLEF-DA[0.75;0.75] dataset
(a) Class-wise predicted accuracy.
(b) Accuracy during training.
(c) Performance on different divergences.
Figure 5: Performance of different unsupervised domain adaptation methods on AW. (a) shows class-wise predicted accuracy on 31 classes, where the first 15 classes have few source samples than the last 16 classes, (b) demonstrates the variation of overall accuracy during training, and (c) reports the performance with different levels of label distribution drift including the original dataset, [0.25;0.25], [0.5;0.5], [0.625;0.625] and [0.75;0.75]. X-axis denotes the KL-divergence values between source and target label distribution corresponding to the value of 0.0390, 0.0882, 0.2817, 0.5085, and 0.8879.

5.2 Competitive Methods

We compare to seven recent deep unsupervised domain adaptation methods and ResNet-50 trained only on source domain without adaptation. JAN [23] and WMMD [33]

are deep transfer models based on maximum mean discrepancy. They learn the adaptation by aligning joint distributions of multiple domain-specific layers. DANN 

[12], CDAN [22], SymNets [34], and BSP [5] are based on adversarial training and make two feature spaces confuse the discriminator. Further, CDAN [22] and SymNets [34] take the conditional feature distribution into consideration and enhance the adaptation performance under normal domain adaptation setting. Moreover, RAAN [4]

reduces feature distribution divergence by minimizing Earth-Mover distance. For all the competitive methods, we use ResNet-50 as the feature extractor for apples to apples comparisons. We re-implement WMMD and RAAN, and conduct experiments for the rest of the methods with their open-source codes.

(a) ResNet-50
(b) DANN
(c) CDAN
(d) LMDAN
Figure 6: Feature space visualization on AW. Red and blue dots denote source and target samples, respectively.
Method Original [0.75;0.75]
ResNet-50 [16] 44.4 41.0 0.7
DANN [12] 63.5 33.9 1.3
JAN [23] 61.6 27.7 2.0
WMMD [33] 45.8 28.4 2.2
CDAN [22] 66.8 38.5 0.9
RAAN [4] 59.0 50.9 1.9
SymNets [34] 51.6 22.6 0.7
BSP [5] 64.7 27.9 3.2
LMDAN 64.9 59.3 1.1
Table 3: Results for unsupervised domain adaptation on SR of VisDA-2017 dataset in original and [0.75;0.75].

5.3 Numerical Evaluation

Table 1 reports quantitative results for unsupervised domain adaptation on Office-31[0.75;0.75] dataset. The performance of all competitive methods significantly drops under the huge label distribution divergence and even becomes worse than non-adapted ResNet-50. This indicates only aligning the feature divergence is not enough for a positive adaptation, for label distribution drift is also a crucial component of domain shift, and has not been paid sufficient attention in domain adaptation area. To dispose of this problem, LMDAN considers source-target sample pairs with different weights, enlarges and shrinks weights for matched and mismatched samples on classified probabilities. By this means, LMDAN outperforms other competitive methods by a large margin. To be noticed, WMMD and RAAN also embed source sample re-weighting strategies into training. However, their weights highly rely on predictions of the target label distribution and make themselves struggle to handle huge label distribution drift. Figure 5 shows more experimental details on AW. The undesirable performance of other methods mainly results from categories with large sizes on target domain. It is shown that DANN and ResNet-50 return almost 0 accuracy on Class 5&15. Thanks to label distribution matching, LMDAN achieves much better predictions on these categories. Moreover, Figure 4(b) shows an increasing performance of LMDAN through iterations, indicating weighting source-target pairs by mini-batch gradually narrows the gap in label space. It is expected to see that all methods degrade with an increasing label distribution divergence in Figure 4(c), which demonstrate the divergence in label space has a huge impact on the adaptation performance. LMDAN delivers robust results than others even under huge label distribution gap, which is essential in practice due to the agnostic target label distribution. The results in Table 2 and 3 provide more promising results of LMDAN on ImageCLEF-DA[0.75;0.75] and VisDA-2017[0.75;0.75].

5.4 Visualization

Figure 6 provides embedded feature space visualization results on AW on Office-31[0.75;0.75] by t-SNE [25]. With huge label distribution divergence between source and target domain, DANN and CDAN cannot preserve the original categorical source structure due to the corruption of representations by aligning mismatched features, and at the same time affect source-target feature alignment dramatically. The negative effects in adaptation further lead to inferior predicted performance on target data. ResNet-50 preserves better source structural features but with less adaptation on domain divergence. LMDAN re-weights samples in source domain, which not only refines the classified decision boundary but also provides better-aligned features.

Figure 7: Label matching on CI of ImageCLEF-DA[0.75;0.75], where LMDAN enlarges the first six and shrinks the last six categories of source data.

5.5 Distribution Matching

Figure 7 shows the effectiveness of LMDAN on label distribution matching. ImageCLEF-DA[0.75;0.75] has few source samples in the first six categories and but more source samples in rest categories, and target samples work inversely. With label distribution matching in LMDAN, sizes of the first six categories in source domain are assigned with larger weights, and this is equal to enlarge the class size. On the contrary, the rest classes are shrunk to match the target label distribution. Therefore, the source label distribution after matching becomes similar to the target one. With matched distributions between source and target domain in both feature and label space during training, LMDAN delivers the positive transfer consistently.

5.6 Hyperparameter Analysis

LMDAN employs as a hyper-parameter to balance the imbalance of the source label distribution and source-target label distribution drift. Table 4 shows the performance of LMDAN on DA on modified Office-31 with different . When , label distribution matching provides the inferior performance since the poor performance from the imbalanced classifier on target sample predictions. With an increasing , LMDAN gains improvements with joint actions from source label distribution internal balancing and source-target label distribution match; however, the first term in Eq. (6) dominates weights when goes too large. Based on the hyperparameter analysis, we set .

Divergence
53.9 62.3 64.8 56.2
34.9 50.4 62.4 51.5
12.2 53.0 57.8 53.3
Table 4: Analysis on in LMDAN on DA.

6 Conclusion

In this paper, we proposed Label distribution Matching Domain Adversarial Network (LMDAN). To tackle considerable label distribution drift between two domains, we designed the label distribution matching and weighting strategy for source samples re-weighting, and match the known source label distribution with the agnostic target one. These weights contributed to decision boundary adaptation and adversarial feature alignment, thus minimized domain divergence. Experimental results demonstrated the superior performance of LMDAN over other methods.

References

  • [1] Y. Balaji, R. Chellappa, and S. Feizi (2019) Normalized wasserstein for mixture distributions with applications in adversarial learning and domain adaptation. In

    Proceedings of The IEEE International Conference on Computer Vision

    ,
    Cited by: §2.
  • [2] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan (2010) A theory of learning from different domains. Machine Learning (), pp. . Cited by: §1, §3.2.
  • [3] B. Bhushan Damodaran, B. Kellenberger, R. Flamary, D. Tuia, and N. Courty (2018) DeepJDOT: deep joint distribution optimal transport for unsupervised domain adaptation. In Proceedings of The European Conference on Computer Vision, Cited by: §2.
  • [4] Q. Chen, Y. Liu, Z. Wang, I. Wassell, and K. Chetty (2018) Re-weighted adversarial adaptation network for unsupervised domain adaptation. In

    Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. . Cited by: §2, Table 1, §5.2, Table 2, Table 3.
  • [5] X. Chen, S. Wang, M. Long, and J. Wang (2019) Transferability vs. discriminability: batch spectral penalization for adversarial domain adaptation. In Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. , , pp. . External Links: Link Cited by: §2, Table 1, §5.2, Table 2, Table 3.
  • [6] S. Cicek and S. Soatto (2019-10) Unsupervised domain adaptation via regularized conditional alignment. In Proceedings of The IEEE International Conference on Computer Vision, Cited by: §2.
  • [7] N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy (2017) Optimal transport for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (), pp. . External Links: Document, ISSN 0162-8828 Cited by: §2.
  • [8] Z. Ding and Y. Fu (2019) Deep transfer low-rank coding for cross-domain learning.

    IEEE Transactions on Neural Networks and Learning Systems

    (), pp. .
    External Links: Document, ISSN 2162-237X Cited by: §2.
  • [9] Z. Ding and H. Liu (2019) Marginalized latent semantic encoder for zero-shot learning. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1.
  • [10] Z. Ding, M. Shao, and Y. Fu (2015) Deep low-rank coding for transfer learning. In

    Twenty-Fourth International Joint Conference on Artificial Intelligence

    ,
    Cited by: §2.
  • [11] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars (2013) Unsupervised visual domain adaptation using subspace alignment. In Proceedings of The IEEE International Conference on Computer Vision, Cited by: §2.
  • [12] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky (2016) Domain-adversarial training of neural networks. Journal of Machine Learning Research (), pp. . External Links: ISSN 1532-4435, Link Cited by: §1, §2, §3.2, §4.3, Table 1, §5.2, Table 2, Table 3.
  • [13] L. Ge, J. Gao, H. Ngo, K. Li, and A. Zhang (2014) On handling negative transfer and imbalanced distributions in multiple source transfer learning.

    Statistical Analysis and Data Mining: The ASA Data Science Journal

    (), pp. .
    Cited by: §1, §2.
  • [14] B. Gong, Y. Shi, F. Sha, and K. Grauman (2012) Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, Vol. . External Links: Document, ISSN 1063-6919 Cited by: §1, §2.
  • [15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, External Links: Link Cited by: §2.
  • [16] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, pp. . Cited by: §4.3, Table 1, Table 2, Table 3.
  • [17] L. V. Kantorovich (2006) On the translocation of masses. Journal of Mathematical Sciences (), pp. . Cited by: §4.2.
  • [18] E. Kodirov, T. Xiang, Z. Fu, and S. Gong (2015) Unsupervised domain adaptation for zero-shot learning. In Proceedings of The IEEE International Conference on Computer Vision, Cited by: §1.
  • [19] P. Li, H. Zhao, and H. Liu (2020-06) Deep fair clustering for visual learning. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  • [20] H. Liu, M. Shao, and Y. Fu (2016) Structure-preserved multi-source domain adaptation. In Proceedings of The IEEE 16th International Conference on Data Mining, Vol. . External Links: Document, ISSN 2374-8486 Cited by: §1.
  • [21] M. Long, Y. Cao, J. Wang, and M. I. Jordan (2015) Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on Machine Learning, External Links: Link Cited by: §1, §2.
  • [22] M. Long, Z. Cao, J. Wang, and M. I. Jordan (2018) Conditional adversarial domain adaptation. In Advances in Neural Information Processing Systems, pp. . Cited by: §1, §2, Table 1, §5.2, Table 2, Table 3.
  • [23] M. Long, H. Zhu, J. Wang, and M. I. Jordan (2017) Deep transfer learning with joint adaptation networks. In Proceedings of The 34th International Conference on Machine Learning, pp. . Cited by: §1, §2, Table 1, §5.2, Table 2, Table 3.
  • [24] Y. Luo, L. Zheng, T. Guan, J. Yu, and Y. Yang (2019) Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1.
  • [25] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of Machine Learning Research (), pp. . Cited by: §5.4.
  • [26] T. Ming Harry Hsu, W. Yu Chen, C. Hou, Y. Hubert Tsai, Y. Yeh, and Y. Frank Wang (2015) Unsupervised domain adaptation with imbalanced cross-domain data. In Proceedings of The IEEE International Conference on Computer Vision, pp. . Cited by: §2.
  • [27] J. Pan and Q. Yang (2010) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering (), pp. . External Links: Document, ISSN 1041-4347 Cited by: §1.
  • [28] P. Panareda Busto and J. Gall (2017-10) Open set domain adaptation. In Proceedings of The IEEE International Conference on Computer Vision, Cited by: §1.
  • [29] X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko (2017) Visda: the visual domain adaptation challenge. arXiv preprint arXiv:1710.06924. Cited by: §5.1.
  • [30] M. Perrot, N. Courty, R. Flamary, and A. Habrard (2016)

    Mapping estimation for discrete optimal transport

    .
    In Advances in Neural Information Processing Systems, External Links: Link Cited by: §2.
  • [31] K. Saenko, B. Kulis, M. Fritz, and T. Darrell (2010) Adapting visual category models to new domains. In Proceedings of The European Conference on Computer Vision, pp. . Cited by: §3.2, §5.1.
  • [32] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell (2017) Adversarial discriminative domain adaptation. In Proceedings of The IEEE International Conference on Computer Vision, Cited by: §1, §2.
  • [33] H. Yan, Y. Ding, P. Li, Q. Wang, Y. Xu, and W. Zuo (2017) Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, pp. . Cited by: §1, §2, Table 1, §5.2, Table 2, Table 3.
  • [34] Y. Zhang, H. Tang, K. Jia, and M. Tan (2019) Domain-symmetric networks for adversarial domain adaptation. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1, §2, Table 1, §5.2, Table 2, Table 3.
  • [35] H. Zhao, R. T. D. Combes, K. Zhang, and G. Gordon (2019) On learning invariant representations for domain adaptation. In Proceedings of The 36th International Conference on Machine Learning, , Vol. , , pp. . External Links: Link Cited by: §3.2.
  • [36] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of The IEEE International Conference on Computer Vision, Cited by: §1.