Self-Adaptive Partial Domain Adaptation

09/18/2021
by   Jian Hu, et al.
8

Partial Domain adaptation (PDA) aims to solve a more practical cross-domain learning problem that assumes target label space is a subset of source label space. However, the mismatched label space causes significant negative transfer. A traditional solution is using soft weights to increase weights of source shared domain and reduce those of source outlier domain. But it still learns features of outliers and leads to negative immigration. The other mainstream idea is to distinguish source domain into shared and outlier parts by hard binary weights, while it is unavailable to correct the tangled shared and outlier classes. In this paper, we propose an end-to-end Self-Adaptive Partial Domain Adaptation(SAPDA) Network. Class weights evaluation mechanism is introduced to dynamically self-rectify the weights of shared, outlier and confused classes, thus the higher confidence samples have the more sufficient weights. Meanwhile it can eliminate the negative transfer caused by the mismatching of label space greatly. Moreover, our strategy can efficiently measure the transferability of samples in a broader sense, so that our method can achieve competitive results on unsupervised DA task likewise. A large number of experiments on multiple benchmarks have demonstrated the effectiveness of our SAPDA.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 9

05/26/2019

Selective Transfer with Reinforced Transfer Network for Partial Domain Adaptation

Partial domain adaptation (PDA) extends standard domain adaptation to a ...
01/06/2021

Partial Domain Adaptation Using Selective Representation Learning For Class-Weight Computation

The generalization power of deep-learning models is dependent on rich-la...
04/13/2022

Calibrating Class Weights with Multi-Modal Information for Partial Video Domain Adaptation

Assuming the source label space subsumes the target one, Partial Video D...
07/11/2021

Partial Video Domain Adaptation with Partial Adversarial Temporal Attentive Network

Partial Domain Adaptation (PDA) is a practical and general domain adapta...
08/28/2019

Heterogeneous Domain Adaptation via Soft Transfer Network

Heterogeneous domain adaptation (HDA) aims to facilitate the learning ta...
06/12/2019

Tackling Partial Domain Adaptation with Self-Supervision

Domain adaptation approaches have shown promising results in reducing th...
08/29/2021

Partial Domain Adaptation without Domain Alignment

Unsupervised domain adaptation (UDA) aims to transfer knowledge from a w...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Transfer learning concentrates on knowledge transfer learned on labelled source domain to unlabeled target domain by narrowing the distance among source and target domains. [13] [32] [40] As one of the core approaches for transfer learning, domain adaptation aims to relieve the need for labelled data.  [42] [20] [44]. There are two classes of methods to deal with it. The first is maximum mean discrepancy (MMD) which manages to maximize the distance between the mean of the two domains in some high dimensional mapping space. The second is the adversarial learning method, which attempts to extract the domain invariant features among the domains. Here, a basic assumption for domain adaptation is source and target domain share the same label space. [30] [16] Whereas, this assumption is hard to satisfy in practice.

Recently, partial domain adaptation task has been projected. Different from standard domain adaptation, PDA assumes that source label space contains target label space. This assumption relaxes the constraint of standard DA method that two domains have the same label space. Different from standard domain adaptation(Standard DA), PDA enables the transfer of knowledge from a domain with plenty of labels to another without labels, where source label space contains target one. Since large-scale annotated datasets such as Google Open Images [17]

and ImageNet-1k 

[31] are available, PDA can be applied to many practical applications.

Partial domain adaptation task contains two crucial points. Firstly, compared with standard DA, PDA is more challenging because target label space is a subset of source label space [14] and is unknown. The common subset of label space between different domains is denoted as shared domain while the part only including source label space is defined as outlier domain. During training of the PDA, an important issue is to select samples belonging to the shared domain. This dilemma can cause negative transfer if we misclassify some classes in the outlier domain to the shared one. Under this circumstance, some target samples can be wrongly grouped into these outlier domain. Secondly, the gap between source shared and target domain should be narrowed during training. Thus, we need to promote a universal framework to deal with it. Some previous methods  [4] [3] [5] [43] [15]

are proposed to handle PDA by weighing classes or samples in a domain adversarial network. They increase weights of the shared classes while decrease weights of outlier classes. In ideal situations, weights of the shared classes should be 1, and those of the outlier classes should be 0. However, they all use probability-weights as class weights. Even though the weights of the source shared classes are meaningfully higher than the source outlier’s, they are far away from 1. As a result, negative transfer still can not be avoided in these methods. Moreover, when the discrepancy among different domains are huge, the boundary between the shared and outlier domain is not clear. So if we just cluster source domain into two groups named as shared and outlier domains, some samples on the boundary are easily to be misclassified, which may lead to aligning features from target domain to outlier domain in the following training steps. As a consequence, it will result in performance degradation.

Fig. 1: Our self-adaptive partial domain adaptation mechanism, the source samples are in purple border and the target samples are in orange. If the best cluster number , are obtained by the means of the class probability-weights in the group. Source and target samples are clustered into three groups by confidence: Groups with highest confidence(weights are close to 1 or 0) are shared and outlier classes respectively, while the group with a low degree of confidence(weights are neither close to 0 nor 1) is group with intermediate classes. In this way, The easily distinguishable shared and outlier classes are separated respectively, while the confused shared and outlier classes are grouped together. As thus, the distance between the high-confidence source shared and the target samples can be effectively narrowed, and the negative transfer caused by the misjudgment of the confused class can be prevented. If the best cluster number , source shared and target samples are clustered into shared groups, and source outlier samples are clustered into outlier group. in shared group are set as 1, in outlier group are set as 0.

To copy with the difficulties of partial domain adaptation, an end to end Self-Adaptive Partial Domain Adaptation (SAPDA) network is proposed. Previous PDA methods only cluster source classes into shared and outlier classes, but the samples on the boundary can be easily misclassified when the discrepancy between different domains is huge, which can cause negative transfer greatly. To deal with this problem, in this paper, SAPDA self-adaptively clustered source label space into different groups. When the number of group is three, our model can not only select out the high-confidence samples as shared and outlier classes, but can also put the samples that are difficult to distinguish into the confused classes. In this way, when the shared and outlier classes are not easy to distinguish, our model can effectively narrow the general gap between high-confident source shared classes and target classes while ignoring the effects of confused classes temporarily. If the groups are only two, SAPDA considered that it has enough confidence to select all the shared and outlier classes clearly. Samples in the same group share the same weights. The higher the weight is, the more possible the sample belongs to shared domain. Meanwhile, the self-adaptive weighted mechanism updates each 500 iterations, which can dynamically correct incorrect clustering weights. In this way, we can avoid misclassification due to the entanglement between shared and outlier domain, and eliminate negative transfer greatly.

SAPDA builds weighted adversarial network to straiten the gap between weighted source and target features, and reduce negative transfer of outlier domain. We carry out comprehensive experiments on five representative domain adaptation benchmark datasets.

Fig. 2: Our SAPDA framework. is the feature extractor, is the domain discriminator,

is the source classifier,

is the class weight vector.

is the self-adaptive class weights evaluation mechanism, is the class weight. is the cluster classifier. The red and blue flows are from source and target domains respectively.

Ii Related Work

Ii-a Domain Adaptation

Domain adaptation is an approach that tries to build domain invariance between different domains, and mitigates the burden of annotating target data. A recent study has indicated that deep neural networks can learn invariant representations. These invariant representations can help knowledge transfer between domains.

Even if deep neural networks can disentangle complex data distributions, the discrepancy across domains can not be removed. Hence, recent works focus on how to connect deep neural network and domain adaptation. Two main approaches are proposed to handle it. The first approach tries to match the high order statistic features by adding an adaptation layer [46] [25]  [24] [39] [19]. The second one tries to extract common features cross domains by appending a domain discriminator  [9] [35]  [47] [36] [28].

Recently, how to combine domain adaptation to realistic application has also get more and more attention. Domain adaptation has been designed as an universal module, applying to object detection [2]  [6]  [33], semantic segmentation  [29]  [38] and person re-id [45]  [41]. They have made a great contribution to alleviate the lack of labels in practical application.

Ii-B Partial Domain Adaptation

Partial domain adaptation assumes target label space is a subset of source label space [14]. Some methods have been presented to deal with PDA problems. Selective Adversarial Network (SAN) [3] employs numerous adversarial networks and relative importance mechanisms to filter outlier classes. Partial Adversarial Domain Adaptation (PADA) [4] modifies SAN by adding class-level relative importance index to source classifier to build a general adversarial network. Importance Weighted Adversarial Nets (IWAN) [43] utilizes an attached domain discriminator to evaluate the sample-level weight and further adds the weight to adversarial network. Example Transfer Network (ETN) [5] uses an auxiliary adversarial network to evaluate sample-level weight and to add the weight on adversarial network. These approaches can effectively process PDA compared with traditional methods.

These methods all use probability-weights to find out whether the class belongs to shared classes. However, in practice, if one class is subject to shared classes, the weight hardly achieves 1. This phenomenon can cause negative transfer. Moreover, when the discrepancy between domains is huge, the boundary between shared and outlier is not clear. So it is hard to classify the samples on the boundary. In other words, if we divide source domain into shared domain and outlier one directly, some of them are easy to misclassify. This paper proposes Self Adaptive Partial Domain Adaptation (SAPDA) that clusters source classes automatically into several different groups according to class probability-weights. Samples in the same group share the same weights. If there are more than two groups, sample weights are obtained by the means of the class probability-weights in the group. Specially, if the groups are only two, sample weights are given as 1 or 0. Furthermore, we utilize Calinsk-Harabaze index  [1] to evaluate the optimal number of groups we cluster into. In this way, we can guarantee a good classification.

Iii Self-Adaptive Partial Domain Adaptation

Iii-a Preliminaries

In PDA, we define as the source label space, as the target label space. Source domain is defined as of labeled samples with classes, while target domain is defined as of unlabeled samples with classes. Under the PDA setting, contains . Source label consists of two parts, shared label space and outlier one . is the same as target label space , which means they have the same set of label classes. Outlier label space is the unique part of , that to say, = . is source-shared domain and is source-outlier domain.

Different from previous PDA methods, we introduce as the domain of samples that are hard to distinguish. On the one hand, if our novel self-adaptive weight mechanism can classify all the source sample into source shared and outlier classes confidently, then we have . On the other hand, if our model is not confident enough to clearly distinguish the samples around the border, we cluster these confused samples into a separate cluster. Under this circumstance, . Our self-adaptive weight updates each 500 iterations.

We assume that source domain and target domain are sampled from distributions and respectively. Similarly, denotes the distribution of source-shared domain and denotes the distribution of source-outlier domain .

In PDA task, the key issue includes two parts: firstly, due to shared and outlier classes are unknown in advance, it is essential to select out unrelated source data belonging to in order to decrease negative transfer. Secondly, we will try to narrow the gap between and . These two issues should be dealt simultaneously.

Our architecture of SAPDA is shown in Fig.2. Class weight is the core variable of our proposed method. It denotes the probability that the sample comes from the source shared domain. Samples in the same group of classes (either in shared group, outlier group or confused group) share the same . acts on some modules, including the domain discriminator , the source classifier and the cluster classifier . It aims to help the network focus on shared domain samples and exclude outlier samples. In the following subsections, we will introduce each modules of the model and how help them to focus on shared domain samples.

In subsection B, we introduce the basic building blocks of our model, including feature extractor , , and .

In subsection C, we specify how the self-adaptive class weights evaluation mechanism computes . In a nutshell, takes as input and outputs . is a -dimension vector. It is the average of outputs from source classifier of all target samples. That is, . represents the possibility of the -th source domain class being part of the shared classes . We use to compute .

In subsection D, we explore how acts on and as a weight value. The influence of is that and is guided to focus on the shared domain samples and not the outlier ones. In this way, positive transfer contributed by samples in is enhanced and negative transfer caused by samples in is mitigated.

In subsection E, we explain why and how is used as cluster label for cluster classifier .

Iii-B Basic building blocks

Feature extractor extracts features from input image samples . That is, .

Source classifier takes extracted features as input and outputs -dimension vectors . That is, and . represents the possibility of sample belonging to the -th source domain classes. The loss of is denoted as:

(1)

Domain discriminator plays the minimax game with to extract domain-invariant features. tries to tell the difference between source domain samples and target domain samples using the features given by while tries to generate domain-agnostic features to confuse . This adversarial interaction between and leads to domain-invariant features. As a result, the source classifier can be applied to the task in instead of just . This framework is proposed by  [9] and it utilizes a minimax loss to implement adversarial weighted domain adaptation:

(2)

Cluster classifier self-adaptively divides source domain into several groups, including shared classes group, outlier classes group and confused classes group. By minimizing the loss of , samples within the same group will be pulled together and samples from different groups will be pushed away from each other. As a result, the model can better classify target domain samples (which belong to the shared classes) without the interference of by outlier samples.

Fig. 3: Illustration of different variables

Iii-C Self-Adaptive Class Weights Evaluation

To compute class weights , we need to first decide which classes of the source label space are the shared classes. Similar to PADA [4], we use the aforementioned weight vector to make this decision. By averaging the outputs from source classifier of all target samples , characterize the probability of each source class being the shared classes. Because a basic assumption in PDA is that is much more similar to compared with .

We normalize to avoid some elements being too small:

(3)

Ideally, weight values in for shared classes should be 1 and those for outlier classes should be 0. But in reality neither of them can achieve the ideal values, which leads to negative transfer. To solve this problem, a basic idea is to set a reasonable threshold. If the -th element is higher than the threshold, the corresponding class weight should be converted to 1, otherwise 0.

But this scheme has two difficulties: first, updates every few iterations, so it is hard to give a constant threshold. Second, when the domain discrepancy is huge, the boundary between shared and outlier domain is not clear. If we rashly classify samples into shared or outlier domains, misclassification can lead to performance degeneration.

Here, we propose a self-adaptive class weights evaluation mechanism. It includes two steps. First, reasonable and dynamic thresholds are set to cluster source classes into one, two and three groups respectively. Second, Calinski-Harabaze index [1] is utilized to evaluate the optimal number of groups we cluster into.

If the optimal group number is three, the source classes are clustered into the shared, outlier and confused groups. It means that the model considers some classes (the confused group) are difficult to distinguish if they are from the shared label space or not. In this way, we can avoid misclassification due to the entanglement between shared and outlier domain and thus eliminate negative transfer greatly.

If the optimal group number is two, the source classes are clustered into the shared and outlier groups, which means the model can clearly distinguish all the source classes.

Specially, if there is only one group, the shared group, the partial domain adaptation degrades to a standard domain adaptation.

After determining shared, outlier and confused classes, we can evaluate class weigth . For samples of shared classes, is set to . For samples of outlier classes, is set to . For samples of confused classes, is set to , which is the mean of confused classes’ weight values in and will be specified later in the subsection.

The complete steps of are as follow:

Step 1. Determine optimal cluster given number of groups

Define k as the number of groups (), and as the set of source classes belonging to the group . is the mean of weight values in that belong to the set . Illustration of these variables can be viewed in Fig.3 (when k=3).

The intra-class variance

is obtained as below:

(4)
(5)

is the total of intra-class variance of the k groups.And it can be defined as:

(6)

Combining (5) and (6), we have:

(7)

By iteratively minimizing , we can determine the best arrangement of the source classes in the . The calculated optimal classification can determine the interruption point of the value in the by minimizing within the group. A basic assumption is, if is minimal, the corresponding cluster situation is the best cluster result for groups cluster [8]. Hence, we need to find out the minimal to effectively cluster the elements in under the circumstance of clusters. Since is independent of , minimizing is equivalent to :

(8)

Hence, when achieves its maximum, the optimal cluster results are obtained under the condition of k groups.

Step 2. Determine the optimal number of groups

Now that we have got cluster results for different cluster number , the next step is to find out which cluster number is the best.

We utilize the Calinski-Harabasz index (CH index) [1] to compute the optimal . The CH index depicts the tightness through the intra-class dispersion matrix and the separation through the inter-class dispersion matrix. The inter-class dispersion matrix B(k) is given by:

(9)

We also set

(10)

The intra-class dispersion matrix S(k) is given by:

(11)

and the CH index is defined as

(12)

Here, is the current cluster group number. is the trace of the intra-class dispersion matrix, while represents the trace of the inter-class dispersion matrix.

Under the condition of groups, the larger CH() is, the smaller the intra-group distance is, the greater inter-group distance is, the better the clustering result is.

Hence ,we get the follow relation:

(13)

Step 3. Evaluate self-rectifying class weights

If the best group number , source classes are clustered into three groups: shared classes , outlier classes and confused classes . For any source sample , its weight is provided as:

(14)

In this case, Our model not only maintains the discriminative ability to the identified shared samples and outlier ones, but also avoids misjudging the difficult-to-identify samples.

If , it just includes two groups and . can be defined as:

(15)

In this situation, the model believes it can distinguish the shared classes and outlier classes clearly, so the weights of shared samples are 1, and the weights of the outlier ones are 0. As a result, we can exclude the impact of the outlier classes as much as possible.

Specially, if best group number is , the model perceives the label space of source and target domains are the same. Partial domain adaptation degrades to standard domain adaptation. All the its weights are 1.

Iii-D Weighted Source Classifier and Domain Classifier

Weighted Source Classifier

The major challenge in PDA is that the class space of target domain is a subset of source ones [14], so source classifiers have poor results in target domain tasks due to the negative transfer caused by outlier classes.

Recall that represents the probability of sample belonging to shared classes. Using as weight values, we can solve the above problem by paying less attention to the outlier classes and only focus on the shared classes. On the other hand, we use an extra weight to draw together samples from the same class in and and push away outlier samples. In summary, we propose weighted source classifier whose loss is defined as follows:

(16)

where

(17)

Here, the information entropy loss is employed in both domains so that they are harder to be misclassified. Because the target samples are unlabeled, they are much easier to tangle with each other. It is worth noting that at first is set as 1, it will update each a few iterations regularly.

Weighted Domain Adaptation Framework

In subsection B, we introduce a paradigm involving and that learns domain-invariant features. Though this paradigm is effective, negative transfer will arise if we apply it to partial domain adaptation directly. Because there is still incongruity between source and target label spaces.

Therefore, we similarly use weight values and to enhance positive transfer and alleviate negative transfer. Specifically, weights for samples in are promoted and weights for samples in are decreased. The weighted adversarial domain adaptation framework for PDA can be defined as follows:

(18)

The higher is, the more likely the sample is from shared classes.

Iii-E Cluster Classifier

As mentioned in subsection B, classify samples into shared, oulier and confused clusters. helps the model recognize shared domain samples by separating each cluster.

The problem is that cluster labels is not originally available like class labels. As a solution, we use as cluster labels. The reason is that, as indicated by Eq.14 and Fig.3, samples from the same cluster have the same weight value . So it is reasonable to use as cluster labels. Specially, cluster labels for target domain samples are 1 because they ought to be clustered with share domain samples.

By utilizing the cross-entropy loss function as

, the loss function of is:

(19)

By minimizing , the model reduces the distance among samples from the same cluster, and extends the distance among samples from different clusters.

During training, if , can be set as follows:

(20)

In Eq.20, is denoted as the number of samples from shared domain, and as the number of samples from outlier domain. We have .

By this means, source shared and target samples can be clustered together. At the same time, outlier ones can be separated away, which decreases negative transfer greatly.

Moreover, if our SAPDA regards the situation as a standard domain adaptation setting, source and target domain share the same cluster label 1. In this situation, can be written as follow:

(21)

Iii-F Self-adaptive Partial Domain Adaptation

A novel self-adaptive partial domain adaptation framework is proposed to handle PDA task. This framework can self-adaptively cluster source domain into different groups to progressively measure the transferability of source classes on sample level by weighting samples in the same group equally, and jointly learn domain-invariant features across different domains. The complete algorithm is as follow:

Input: labeled source data and unlabeled target data
Output: predicted labels
Initialization: 1 for all samples
for  do
     1) Extract features from one batch:
      
     
     2) Classify each samples in one batch and use eq.(16)
      to calculate loss:
      
     
     3) Uses eq.(18) to calculate loss for classifier
     4) Uses eq.(19)/(20)/(21) to calculate loss for
     5) Loss back propagation
     if  then
         a) Update using .
         b) Update using eq.(14)/(15) or 1 for all
          samples
     end if
end for
Algorithm 1 SAPDA

, , and are respectively the parameters of , , and . A saddle point solution (, , , ) is achieved by a end-to-end minimax optimization procedure:

(22)
Method Caltech-Office (10 classes 5 classes)
C A C W C D A C A W A D W C W A W D D C D A D W Avg
AlexNet [18] 93.58 83.70 91.18 85.27 76.30 85.29 74.17 87.37 100.00 80.82 89.51 98.52 98.52
DaNN [9] 91.86 82.22 83.82 77.57 65.93 80.88 72.60 80.30 95.59 69.35 77.09 80.74 79.83
RTN [27] 91.86 93.33 80.88 80.99 69.63 70.59 59.08 74.73 100.00 59.08 70.02 91.11 78.44
ADDA [36] 93.15 94.07 97.06 85.27 87.41 89.71 86.82 92.08 100.00 89.90 93.79 98.52 92.31
IWAN [43] 94.22 97.78 98.53 89.90 87.41 88.24 90.24 95.29 100.00 91.61 94.43 98.52 93.85
PADA [4] 96.25 96.00 97.59 92.05 87.33 96.39 96.85 96.14 100.00 95.80 97.31 97.87 95.72
ETN  [5] 96.16 96.02 98.33 95.13 90.01 98.54 96.06 96.66 100.00 96.00 96.14 97.93 96.42
SAPDA 97.11 98.98 99.07 97.60 92.00 100.00 97.80 97.52 100.00 98.05 96.90 100.00 97.92
TABLE I: Accuracy of partial DA tasks on Caltech-Office (10 classes 5 classes).
Method Office-31
A W D W W D A D D A W A Avg
ResNet  [12] 54.52 94.57 94.27 65.61 73.17 71.71 75.64
DAN  [26] 46.44 53.56 58.60 42.68 65.66 65.34 55.38
DaNN [9] 41.35 46.78 38.85 41.36 41.34 44.68 42.39
RTN  [27] 75.25 97.12 98.32 66.88 85.59 85.70 84.81
IWAN  [43] 76.27 98.98 100.00 78.98 89.46 81.73 87.57
SAN  [3] 81.82 98.64 100.00 81.28 80.58 83.09 87.27
PADA  [4] 86.54 99.32 100.00 82.27 92.69 95.41 92.69
ETN  [5] 94.52 100.00 100.00 95.03 96.21 94.54 96.73
SAPDA 96.61 100.00 100.00 97.45 96.49 95.83 97.73
TABLE II: Accuracy of partial domain adaptation tasks on Office-31
Method VisDA2017 Method Caltech-Office
S R C W C A C D Avg
ResNet  [12] 45.62 AlexNet  [18] 58.44 74.64 65.86 66.98
DaNN  [9] 51.01 ResNet  [12] 61.33 77.57 68.90 69.27
RTN  [27] 50.04 DaNN [9] 54.57 72.86 57.96 61.80
IWAN  [43] 52.18 RTN  [27] 71.02 81.32 62.35 71.56
SAN  [3] 52.06 DAN  [26] 42.37 70,75 47.04 53.39
PADA  [4] 53.53 SAN  [3] 88.33 83.87 85.54 85.83
ETN  [5] 57.09 PADA  [4] 89.07 89.34 88.54 88.93
SAPDA 59.87 SAPDA 89.83 92.93 90.45 91.07
TABLE III: Accuracy of partial domain adaptation tasks on VisDA2017(12 classes 6 classes) and Caltech-Office(256 classes 10 classes)
Method Office-Home
Ar Cl Ar Pr Ar Rw Cl Ar Cl Pr Cl Rw Pr Ar Pr Cl Pr Rw Rw Ar Rw Cl Rw Pr Avg
ResNet-50 [12] 46.33 67.51 75.87 59.14 59.94 62.73 58.22 41.79 74.88 67.40 48.18 74.17 61.35
DaNN [9] 43.76 67.90 77.47 63.73 58.99 67.59 56.84 37.07 76.37 69.15 44.30 77.48 61.72
RTN [27] 49.31 57.70 80.07 63.54 63.47 73.38 65.11 41.73 75.32 63.18 43.57 80.50 63.07
IWAN [43] 53.94 54.45 78.12 61.31 47.95 63.32 54.17 52.02 81.28 76.46 56.75 82.90 63.56
SAN [3] 44.42 68.68 74.60 67.49 64.99 77.80 59.78 44.72 80.07 72.18 50.21 78.66 65.30
PADA [4] 51.95 67.00 78.74 52.16 53.78 59.03 52.61 43.22 78.79 73.73 56.60 77.09 62.06
MWPDA [15] 55.39 77.53 81.27 57.08 61.03 62.33 68.74 56.42 86.67 76.70 57.67 80.06 68.41
ETN  [5] 59.24 77.03 79.54 62.92 65.73 75.01 68.29 55.37 84.37 75.72 57.66 84.54 70.45
BUS [22] 60.62 83.16 88.39 71.75 72.79 83.40 75.45 61.59 86.53 79.25 62.80 86.05 75.98
SAPDA 63.81 82.55 85.66 72.34 73.07 82.66 77.64 62.90 88.64 80.15 63.55 86.29 76.61
TABLE IV: Accuracy of partial DA tasks on Office-Home (65 classes 25 classes).

Iv Experiment

To illustrate the performance of SAPDA, we conduct some experiments on four benchmark compared with previous standard and partial domain adaptation methods.

Iv-a Set up

Office-31 [32] dataset is a classic dataset for domain adaptation. It involves three domains: DSLR, Amazon, and Webcam, we denote them as D31, A31 and W31 respectively. They are set as source domains. There are 10 categories [10] shared by Caltech-256 [11] and Office-31 dataset. These 10 categories, denoted as W10, A10 and D10, are set as target domain. Moreover, Caltech-256 are also set as source domain and A10, W10 and D10 are set as target domain on three tasks.

Method Office-31
A W D W W D A D D A W A Avg
SAPDA w/o self-adaptive class weights evaluation mechanism 87.09 100.00 100.00 81.13 91.26 95.30 92.46
SAPDA w/o cluster classifier 91.07 100.00 100.00 96.45 95.02 95.42 96.33
SAPDA with shared, outlier and confused classes 92.03 97.53 98.56 94.37 92.06 94.02 94.76
SAPDA with shared and outlier classes 95.37 100.00 100.00 95.01 95.02 95.39 96.79
SAPDA 96.61 100.00 100.00 97.45 96.49 95.83 97.73
TABLE V: Accuracy on Partial DA tasks of SAPDA and its variants on Office-31(31 classes 10 classes)
Method Office-Home
Ar Cl Ar Pr Ar Rw Cl Ar Cl Pr Cl Rw Pr Ar Pr Cl Pr Rw Rw Ar Rw Cl Rw Pr Avg
ResNet-50 [12] 34.9 50.0 58.0 37.4 41.9 46.2 38.5 31.2 60.4 53.9 41.2 59.9 46.1
DaNN [9] 45.6 59.3 70.1 47.0 58.5 60.9 46.1 43.7 68.5 63.2 51.8 76.8 57.6
JAN [36] 45.9 61.2 68.9 50.4 59.7 61.0 45.8 43.4 70.3 63.9 52.4 76.8 58.3
DAN [43] 45.6 67.7 73.9 57.7 63.8 66.0 54.9 40.0 74.5 66.2 49.1 77.9 61.4
CDAN+E [23] 50.7 70.6 76.0 57.6 70.0 70.0 57.4 50.9 77.3 70.9 56.7 81.6 65.8
DRCN [21] 50.6 72.4 76.8 61.9 69.5 71.3 60.4 48.6 76.8 72.9 56.1 81.4 66.6
SAPDA 51.5 71.3 76.9 59.0 71.3 71.6 58.6 51.2 77.6 71.5 59.1 82.6 66.9
TABLE VI: Accuracy of standard DA tasks on Office-Home (65 classes 65 classes).

Caltech-Office dataset utilizes 10 classes shared by Caltech-256 and Office-31 as source domain, denoted as W10, D10, A10 and C10, the first 5 classes in these 10 categories, denoted as W5, D5, A5 and C5, are set as target domain.

Office-Home dataset [37] is a much more intriguing dataset with the huger domain gap. It includes 65 categories with four domains: Artistic, Product images, Real-World and Clip Art. We define 65 classes in four domains as source domain Ar, Pr, Rw and Cl. Four domains with the first 25 classes are donated as target domain.

VisDA-2017 dataset is one of the most challenging dataset in domain adaptation, the synthetic data to real-image track is evaluated here. Under our partial domain adaptation setting, the first 6 classes are chosen as target domain and Synthetic 12 Real6 task is conducted as S R.

The proposed SAPDA is compared with present standard DA and PDA methods. Among all the experiments, both standard DA and PDA methods are perform on PDA setting. ResNet-50 are used as the base backbone for all the methods except for AlexNet. Meanwhile, classic supervised learning methods like ResNet-50 train on the labeled source domain and test on the unlabeled target domain.

Furthermore, plenty of ablation experiments are carried on by assessing four variants of SAPDA: 1) SAPDA w/o self-adaptive class weights evaluation mechanism is the variant without self-adaptive class weights evaluation, degenerating to PADA with cluster classifier. 2) SAPDA w/o cluster classifier is the variant without cluster classifier. 3) SAPDA with shared, outlier and confused classes is the variant takes the weights with shared, outlier and confused classes. 4) SAPDA with shared and outlier classes is the variant that takes the weights with only shared and outlier classes.

Our implementation is based on PyTorch, and finetune pre-trained ResNet-50 

[12]. Similar to DaNN, a bottleneck layer is added after the feature extractor. Bottleneck layer, outlier discriminator domain discriminator and feature extractor

are trained from square one. Mini-batch stochastic gradient descent are used during the training. We also select the same learning rate as DaNN. The learning rate is coordinated during training following

, where and are changed with importance-weighted cross-validation  [34], and p is the hyper-parameter optimized based on the dataset.

Iv-B Result

The classification results on Office10-Caltech5, a set of Office-31 and VisDA-2017, a set of Caltech256-Office10 and Office-Home are respectively shown in Table IIV. We also perform some ablation experiments in Table V. The results indicate our SAPDA outperforms all the standard DA and PDA methods.

We also have some insightful observations. (1) supervised methods like AlexNet and ResNet perform better on standard DA method under PDA setting, it shows negative transfer has negative impact on the accuracy when the features from outlier source classes are learned by standard DA methods such as DaNN and DAN. (2) RTN utilizes the entropy minimization criterion to modify the problem. Hence, it is an improvement over ResNet, but there is still some negative transfer on most tasks. (3) Since the weight mechanism can select the shared classes and promote their weights, PDA methods achieve better result than ResNet-50 and other standard DA methods. (4) Our SAPDA outperforms both standard DA and PDA methods, demonstrating that our self-adaptive weight mechanism can effectively utilize the confused class to avoid misjudging confused samples.

(a) ResNet
(b) DaNN
(c) PADA
(d) SAPDA
Fig. 4: Comparison between class weights histograms learned from ResNet-50,DaNN, PADA and SAPDA
(a) ResNet
(b) DaNN
(c) PADA
(d) SAPDA
Fig. 5: Visualization of features learned by ResNet,DaNN, PADA and SAPDA

We further discover different components of SAPDA by contrast with the results of SAPDA variants in Tables V. (1) SAPDA outperforms SAPDA w/o self-adaptive class weights evaluation mechanism, proving that using self-adaptive class weights evaluation mechanism can select out reasonable outlier classes, further weaken the negative impact of outlier data, and force the source classifier to pay attention to data pertaining to the target label space. (2) SAPDA outperforms SAPDA w/o cluster classifier, showing the cluster classifier can gather different classes more tightly to avoid misclassification partly. (3) SAPDA with shared, outlier and confused classes gets the worst results on almost each task especially on the tasks D31 W10 and W31 D10. On these two tasks, because of the gap between different classes is small, it is easy for model to select out shared and outlier classes. But when we cluster source classes into shared, outlier and confused classes, neither the shared classes can be weighted as 1, nor the outlier classes are weighted as 0. This phenomenon can cause negative transfer, reducing accuracy, which also illustrate the necessity of putting to the expected values. Meanwhile, the result on task S R performs much better than SAPDA with shared, outlier and confused classes. This shows that in a more challenging task, arbitrarily converting the soft to a binary value can easily cause the difficult-to-discern shared classes to be misjudged as outlier classes or vice versa. This result also illustrates the necessity of self cluster weights mechanism. (4) SAPDA with shared and outlier classes achieves close result to SAPDA on Office-31 dataset, but does not perform well on more difficult VisDA-2017. It indicates the self cluster weights mechanism can play a much more important role when the task is much more harder.

Moreover, we also apply our method to the standard domain adaptation problem, which is shown in Table VI, because our sample weight evaluation mechanism became a transferability degree measurement mechanism, which also helped to improve the classification accuracy under this setting.

hyper A W A D Avg on Office-31
0.01 95.77 96.90 96.54
0.02 95.74 97.30 96.22
0.05 95.81 97.22 96.98
0.1 96.61 97.45 97.73
0.5 96.61 96.90 97.02
1 95.77 97.22 96.87
TABLE VII: Accuracy on Office-31 by different of partial domain adaptation tasks

Iv-C Analysis

Class Weight: Fig. 4(a)-(d) shows class weight histograms for task A (31 classes) W (10 classes). They are learned by fineturing ResNet-50, DaNN, PADA and our SAPDA respectively. The blue bins are weights for shared classes and the red are the weights for outlier classes.

Fig. 4(a) shows ResNet can distinguish some target samples due to finetune. Fig. 4(b) implies negative transfer causes network cannot distinguish shared and outlier classes. Fig. 4(c) illustrates although PADA can filter out shared classes, the weights of shared cannot achieve 1 while the weights of the outlier cannot achieve 0, which can still bring about performance reduction. Fig. 4(d) shows our SPADA can not only select shared classes correctly, but can also promote the weights of shared to 1 and down else to 0, which can eliminate negative transfer greatly.

Fig. 6: The number of clusters w.r.t. to iterations

Feature Visualization: Fig  5(a)-(d) are the feature visualization results by the t-SNE embeddings [7] for ResNet, DaNN, PADA and SAPDA. The blue points, the green points and the red points represents source shared, source outlier and target samples respectively. Based on these four graphs, we can get some insightful observation: (1) ResNet-50 can only classifier a few target samples into correct categorises due to finetune, while DaNN almost cannot distinguish target samples into correct classes implies that mismatch between different label spaces deteriorate the accuracy. (2) PADA can cluster most target samples into correct classes but the bound between shared and outlier classes is not distinct, which can still cause misclassification. (3) Our SAPDA can not only distinguish target samples into correct classes, but also cluster source outlier classes together while the distance between other clusters is far. In this way, target samples can almost be misclassified.

Number of Clusters: Task W31 to A10 achieves the worst performance among the six tasks of Office-31 dataset, which means it is the most difficult task for our framework. Hence, in Fig.6 we utilize this task to shows the number of clusters w.r.t. to iterations. The left ordinate is the accuracy of the task w.r.t. to iterations, and the right ordinate is the number of cluster groups of the task w.r.t. to iterations.

We have some interesting observations from this figure 6 Even if we set the number of clusters ranging from 1 to 3, the actual number of clusters selected by the network only includes two or three under PDA setting. This phenomenon reflects that when it is a little hard to classify some mixed samples, the framework clusters our source data into three groups, while when the gap between shared and outlier classes is obvious, the framework can classify the shared and outlier classes easily, the number of cluster groups is two. (2) When the accuracy improves significantly, our network believes it has enough ability to handle the task, the number of cluster groups decreases from three to two. Once the accuracy decreases, the network can find out some of the classes having been misclassified, the number of cluster groups can increase from two to three. In this way, the misclassified classes can be adjusted until all the classes have been arranged into the correct groups. This phenomenon implies that our framework has the ability to self-adaptive correct weights for source classes.

Fig. 7: Target test accuracy w.r.t. to target classes.

Target Class: We carry out some experiments with different target classes. Fig.7 shows DaNN performs worse as the number of target categories reduces, It indicates the influence of negative transfer caused by incongruity between different label spaces. Performance of SAN declines slowly and steadily, suggesting that the SAN has the potential to eliminate the effects of outlier classes. IWAN performs ordinary compared with SAN. Our SAPDA performs generally better than the other methods. Besides, when the number of target classes decreases, our model can achieve higher accuracy, showing our mechanism can not only select out outlier classes, but can also promote performance.

Fig. 8: Target test accuracy w.r.t. to iteration.

Sensitive Analysis:

In order to better analyze the sensitivity of our model to hyperparameter

, in Table VII, we observed the influence of different on experimental results on the Office-31 dataset. It is not difficult to find that the experimental results are best when . Although other also affect the results, the overall results are relatively stable.

Convergence Performance: As shown in Fig. 8, compared with it previous methods, our SAPDA does not only converge fast but also converges to highly accurate solutions, implying the robustness and efficiency of our SAPDA.

V Conclusion

This paper presents an end to end Self-Adaptive Partial Domain Adaptation framework. It self-adaptively clusters source classes into different groups, and samples in the same group having the same weights. In this way, weighted adversarial network progressively quantifies the transferability of source examples, and simultaneously learns domain-invariant features across source and target domains. Experiments show effectiveness of our model and superiority over several benchmarks.

References

  • [1] T. Caliński and J. Harabasz (1974)

    A dendrite method for cluster analysis

    .
    Communications in Statistics-theory and Methods 3 (1), pp. 1–27. Cited by: §II-B, §III-C, §III-C.
  • [2] Y. Cao, D. Guan, W. Huang, J. Yang, Y. Cao, and Y. Qiao (2019) Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. information fusion 46, pp. 206–217. Cited by: §II-A.
  • [3] Z. Cao, M. Long, J. Wang, and M. I. Jordan (2018) Partial transfer learning with selective adversarial networks. pp. 2724–2732. Cited by: §I, §II-B, TABLE II, TABLE III, TABLE IV.
  • [4] Z. Cao, L. Ma, M. Long, and J. Wang (2018) Partial adversarial domain adaptation. pp. 135–150. Cited by: §I, §II-B, §III-C, TABLE I, TABLE II, TABLE III, TABLE IV.
  • [5] Z. Cao, K. You, M. Long, J. Wang, and Q. Yang (2019) Learning to transfer examples for partial domain adaptation. pp. 2985–2994. Cited by: §I, §II-B, TABLE I, TABLE II, TABLE III, TABLE IV.
  • [6] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. V. Gool (2018) Domain adaptive faster r-cnn for object detection in the wild. In

    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR)

    ,
    Cited by: §II-A.
  • [7] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell (2014) Decaf: a deep convolutional activation feature for generic visual recognition. pp. 647–655. Cited by: §IV-C.
  • [8] W. D. Fisher (1958) On grouping for maximum homogeneity. Journal of the American statistical Association 53 (284), pp. 789–798. Cited by: §III-C.
  • [9] Y. Ganin, E. Ustinova, H. . Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. S. Lempitsky (2016) Domain-adversarial training of neural networks.

    Journal of Machine Learning Research

    17, pp. 59:1–59:35.
    Cited by: §II-A, §III-B, TABLE I, TABLE II, TABLE III, TABLE IV, TABLE VI.
  • [10] B. Gong, Y. Shi, F. Sha, and K. Grauman (2012) Geodesic flow kernel for unsupervised domain adaptation. pp. 2066–2073. Cited by: §IV-A.
  • [11] G. Griffin, A. Holub, and P. Perona (2007) Caltech-256 object category dataset.. Technical report. Cited by: §IV-A.
  • [12] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. pp. 770–778. Cited by: TABLE II, TABLE III, TABLE IV, §IV-A, TABLE VI.
  • [13] J. Hoffman, E. Tzeng, T. Park, J. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell (2018) Cycada: cycle-consistent adversarial domain adaptation. pp. 1989–1998. Cited by: §I.
  • [14] J. Hu, H. Tuo, C. Wang, L. Qiao, H. Zhong, Y. Jun, and Z. Jing (2020) Discriminative partial domain adversarial network. Note: In European Computer Vision Conference (ECCV) Cited by: §I, §II-B, §III-D.
  • [15] J. Hu, H. Tuo, C. Wang, L. Qiao, H. Zhong, and Z. Jing (2019) Multi-weight partial domain adaptation.. pp. 5. Cited by: §I, TABLE IV.
  • [16] J. Hu, H. Tuo, C. Wang, H. Zhong, H. Pan, and Z. Jing (2020) Unsupervised satellite image classification based on partial transfer learning. Aerospace Systems 3 (1), pp. 21–28. Cited by: §I.
  • [17] I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, A. Veit, et al. (2017) Openimages: a public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github. com/openimages 2 (3), pp. 18. Cited by: §I.
  • [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks.. In Advances in Neural Information Processing Systems (NIPS). Cited by: TABLE I, TABLE III.
  • [19] D. Li, Y. Lu, W. Wang, Z. Lai, J. Zhou, and X. Li (2021) Discriminative invariant alignment for unsupervised domain adaptation. IEEE Transactions on Multimedia (), pp. 1–1. External Links: Document Cited by: §II-A.
  • [20] J. Li, K. Lu, Z. Huang, L. Zhu, and H. T. Shen (2018) Heterogeneous domain adaptation through progressive alignment. IEEE transactions on neural networks and learning systems 30 (5), pp. 1381–1391. Cited by: §I.
  • [21] S. Li, C. H. Liu, Q. Lin, Q. Wen, L. Su, G. Huang, and Z. Ding (2020) Deep residual correction network for partial domain adaptation. IEEE transactions on pattern analysis and machine intelligence. Cited by: TABLE VI.
  • [22] J. Liang, Y. Wang, D. Hu, R. He, and J. Feng (2020) A balanced and uncertainty-aware approach for partial domain adaptation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pp. 123–140. Cited by: TABLE IV.
  • [23] M. Long, Z. Cao, J. Wang, and M. I. Jordan (2018) Conditional adversarial domain adaptation. In NeurIPS, Cited by: TABLE VI.
  • [24] M. Long, H. Zhu, J. Wang, and M. I. Jordan. (2016) Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems. Cited by: §II-A.
  • [25] M. Long, Y. Cao, J. Wang, and M. Jordan (2015) Learning transferable features with deep adaptation networks. pp. 97–105. Cited by: §II-A.
  • [26] M. Long, Y. Cao, J. Wang, and M. Jordan (2015) Learning transferable features with deep adaptation networks. pp. 97–105. Cited by: TABLE II, TABLE III.
  • [27] M. Long, H. Zhu, J. Wang, and M. I. Jordan (2017) Deep transfer learning with joint adaptation networks. pp. 2208–2217. Cited by: TABLE I, TABLE II, TABLE III, TABLE IV.
  • [28] Z. Luo, Y. Zou, J. Hoffman, and L. Fei-Fei (2017) Label efficient learning of transferable representations across domains and tasks. arXiv preprint arXiv:1712.00123. Cited by: §II-A.
  • [29] F. Pan, I. Shin, F. Rameau, S. Lee, and I. Kweon (2020) Unsupervised intra-domain adaptation for semantic segmentation through self-supervision. Note: In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Cited by: §II-A.
  • [30] Y. Pan, T. Yao, Y. Li, Y. Wang, C. Ngo, and T. Mei (2019) Transferrable prototypical networks for unsupervised domain adaptation. pp. 2239–2247. Cited by: §I.
  • [31] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. (2015) ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, pp. 211–252. Cited by: §I.
  • [32] K. Saenko, B. Kulis, M. Fritz, and T. Darrell (2010) Adapting visual category models to new domains. pp. 213–226. Cited by: §I, §IV-A.
  • [33] S. Song, Z. Miao, H. Yu, J. Fang, K. Zheng, C. Ma, and S. Wang (2020) Deep domain adaptation based multi-spectral salient object detection. IEEE Transactions on Multimedia (), pp. 1–1. External Links: Document Cited by: §II-A.
  • [34] M. Sugiyama, M. Krauledat, and K.-R. Muller. (2007) Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research (JMLR) 8, pp. 985–1005. Cited by: §IV-A.
  • [35] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko (2015) Simultaneous deep transfer across domains and tasks. pp. 4068–4076. Cited by: §II-A.
  • [36] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell (2017) Adversarial discriminative domain adaptation. pp. 7167–7176. Cited by: §II-A, TABLE I, TABLE VI.
  • [37] H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan (2017) Deep hashing network for unsupervised domain adaptation. pp. 5018–5027. Cited by: §IV-A.
  • [38] T. H. Vu, H. Jain, M. Bucher, M. Cord, and P. Perez (2019) ADVENT: adversarial entropy minimization for domain adaptation in semantic segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II-A.
  • [39] C. Wang, H. Tuo, J. Wang, and L. Qiao (2019) Discriminative transfer learning via local and global structure preservation. Signal, Image and Video Processing 13 (4), pp. 753–760. Cited by: §II-A.
  • [40] Z. Wang, B. Du, and Y. Guo (2019) Domain adaptation with neural embedding matching. IEEE transactions on neural networks and learning systems 31 (7), pp. 2387–2397. Cited by: §I.
  • [41] F. Yang, K. Yan, S. Lu, H. Jia, D. Xie, Z. Yu, X. Guo, F. Huang, and W. Gao (2020) Part-aware progressive unsupervised domain adaptation for person re-identification. IEEE Transactions on Multimedia (), pp. 1–1. External Links: Document Cited by: §II-A.
  • [42] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson (2014) How transferable are features in deep neural networks?. arXiv preprint arXiv:1411.1792. Cited by: §I.
  • [43] J. Zhang, Z. Ding, W. Li, and P. Ogunbona (2018) Importance weighted adversarial nets for partial domain adaptation. pp. 8156–8164. Cited by: §I, §II-B, TABLE I, TABLE II, TABLE III, TABLE IV, TABLE VI.
  • [44] S. Zhang, H. Tuo, J. Hu, and Z. Jing (2021) Domain adaptive yolo for one-stage cross-domain detection. arXiv preprint arXiv:2106.13939. Cited by: §I.
  • [45] F. Zhao, S. Liao, G. Xie, J. Zhao, K. Zhang, and L. Shao (2020) Unsupervised domain adaptation with noise resistible mutual-training for person re-identification. In European Conference on Computer Vision, pp. 526–544. Cited by: §II-A.
  • [46] H. Zhong, H. Tuo, C. Wang, X. Ren, J. Hu, and L. Qiao (2019-09) Source-constraint adversarial domain adaptation. IEEE International Conference on Image Processing (ICIP), pp. 2486–2490. External Links: Document Cited by: §II-A.
  • [47] H. Zhong, C. Wang, H. Tuo, J. Hu, L. Qiao, and Z. Jing (2019) Transfer learning based on joint feature matching and adversarial networks. Journal of Shanghai Jiaotong University (Science) 24 (6), pp. 699–705. Cited by: §II-A.