Improving Pseudo Labels With Intra-Class Similarity for Unsupervised Domain Adaptation

07/25/2022
by   Jie Wang, et al.
4

Unsupervised domain adaptation (UDA) transfers knowledge from a label-rich source domain to a different but related fully-unlabeled target domain. To address the problem of domain shift, more and more UDA methods adopt pseudo labels of the target samples to improve the generalization ability on the target domain. However, inaccurate pseudo labels of the target samples may yield suboptimal performance with error accumulation during the optimization process. Moreover, once the pseudo labels are generated, how to remedy the generated pseudo labels is far from explored. In this paper, we propose a novel approach to improve the accuracy of the pseudo labels in the target domain. It first generates coarse pseudo labels by a conventional UDA method. Then, it iteratively exploits the intra-class similarity of the target samples for improving the generated coarse pseudo labels, and aligns the source and target domains with the improved pseudo labels. The accuracy improvement of the pseudo labels is made by first deleting dissimilar samples, and then using spanning trees to eliminate the samples with the wrong pseudo labels in the intra-class samples. We have applied the proposed approach to several conventional UDA methods as an additional term. Experimental results demonstrate that the proposed method can boost the accuracy of the pseudo labels and further lead to more discriminative and domain invariant features than the conventional baselines.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

09/09/2021

Generation, augmentation, and alignment: A pseudo-source domain based method for source-free domain adaptation

Conventional unsupervised domain adaptation (UDA) methods need to access...
06/10/2022

Unsupervised Foggy Scene Understanding via Self Spatial-Temporal Label Diffusion

Understanding foggy image sequence in the driving scenes is critical for...
11/23/2021

A self-training framework for glaucoma grading in OCT B-scans

In this paper, we present a self-training-based framework for glaucoma g...
05/29/2022

ProxyMix: Proxy-based Mixup Training with Label Refinery for Source-Free Domain Adaptation

Unsupervised domain adaptation (UDA) aims to transfer knowledge from a l...
07/19/2021

Cell Detection in Domain Shift Problem Using Pseudo-Cell-Position Heatmap

The domain shift problem is an important issue in automatic cell detecti...
01/09/2022

Learning class prototypes from Synthetic InSAR with Vision Transformers

The detection of early signs of volcanic unrest preceding an eruption, i...
09/01/2020

Unsupervised Domain Adaptation with Progressive Adaptation of Subspaces

Unsupervised Domain Adaptation (UDA) aims to classify unlabeled target d...

1 Introduction

It is known that machine learning benefits from manually-labeled data. However, manual labeling is often time-consuming and laboring intensive. Moreover, it is even unlikely to obtain sufficient manual labels in some scenarios. How to address the insufficient labeling problem is a key task. One of the approaches to address the problem is domain adaptation, which aims to transfer knowledge from a label-rich source domain to a different but related target domain

Pan and Yang (2010); Zhuang et al. (2020). Based on whether the target domain is human labeled, domain adaptation can be divided into two categories Liang et al. (2021), which are semi-supervised domain adaptation Wang et al. (2019b) and unsupervised domain adaptation Fernando et al. (2013). In this paper, we focus on unsupervised domain adaptation (UDA) where the target domain does not have manual labels. It is not only challenging but also finds its applications in many real-world scenarios.

In the past decades, UDA has been widely studied Liang et al. (2019); Long et al. (2015); Li et al. (2019a); Wang and Breckon (2022); Li et al. (2019b). The most common approach is to find a common subspace in which the data distributions of the source domain and target domain are similar. The first issue on learning the subspace is to define a suitable distribution divergence measurement between the two domains. A common measurement is the maximum mean discrepancy (MMD) Gretton et al. (2006, 2012); Zhu et al. (2020); Xiao and Zhang (2021). By minimizing the distribution divergence as a regularizer, a common subspace could be found. For example, Pan et al. (2011) learns a domain-invariant projection and meanwhile minimizes the marginal distribution divergences between the source domain and the target domain.

Recently, a new branch of the UDA research is to use the pseudo labels of the data in the target domain to align the distributions of the source and target domains Luo et al. (2020); Li et al. (2018); Wang and Breckon (2020)

, where the pseudo labels are usually obtained by a classifier trained on the source domain. For example, following

Pan et al. (2011), the work Long et al. (2013) further reduces the marginal distribution divergence and conditional distribution divergence between the source domain and the target domain iteratively with the pseudo labels of the target domain. Li et al. (2018) learns both domain invariant and class discriminative features with the pseudo labels. Li et al. (2019a)

preserves the neighborhood relationship of samples and improves robustness against outliers by supervised locality preserving projection

He and Niyogi (2003) with the pseudo labels. Wang et al. (2021) proposes a discriminative MMD to mitigate the degradation of feature discriminability incurred by MMD.


Figure 1: Motivation of the proposed TSRP. Most related works focus on learning domain invariance features while ignoring the intra-class similarity between target samples. On the contrary, TSRP aims to explore the intra-class similarity between the samples in the target domain to remedy pseudo labels, which in turn leads to better domain invariance features.

Although the pseudo label generation approaches have made significant contribution to UDA, there are still two issues far from explored. First, the pseudo labels are mainly obtained by a good alignment between the source domain and the target domain, while the effect of the accuracy of the pseudo labels on performance is not studied deeply. When the pseudo labels are generated by a classifier trained on the source domain, which is the common way, there may be some incorrect pseudo labels in each optimization iteration as shown in Fig. 1. Due to the cumulation of the errors, the incorrect pseudo labels can greatly affect the final performance. Second, most of the methods mainly focus on mining the source domain to improve the accuracy of the pseudo labels in the target domain, however, to our knowledge, the intrinsic relationship between the samples in the target domain seems unexplored yet.

To address the aforementioned two issues, in this paper, we propose to mine the target domain intra-class similarity to remedy the pseudo labels (TSRP) in the target domain for improving the accuracy of the pseudo labels. A core idea of TSRP is to use target similarity to pick pseudo labels with high confidence (UTSP) by spanning trees Alpert et al. (1995). Then, the selected highly-confident pseudo-labeled samples as well as the source data are used to train a strong classifier. The strong classifier is used to correct part of the the wrongly-labeled target samples that have low-confident pseudo labels. We call it the remedial process of the pseudo labels. Generally, our method can be integrated into any methods that generate the pseudo labels of the target domain by using the classifiers trained on the source domain.

Our contribution is summarized as follows:

  • We propose TSRP to improve the accuracy of the pseudo labels in the target domain. TSRP iteratively exploits the intra-class similarity of the samples in the target domain for improving the generated coarse pseudo labels, and aligns the source and target domains with the improved pseudo labels.

  • We propose UTSP to select highly-confident pseudo-labeled samples. UTSP first deletes dissimilar samples, and then uses spanning trees to eliminate the samples with the wrong pseudo labels in the intra-class samples.

  • We extended four UDA algorithms Li et al. (2018); Long et al. (2013); Fukunaga and Narendra (1975); Wang et al. (2017) with TSRP, and evaluated the effectiveness of TSRP by comparing the UDA algorithms with their TSRP extensions. Experimental results on several benchmark datasets show that TSRP can be used as a term of the UDA methods for improving their generalization ability. Moreover, we have compared the proposed “DICD Li et al. (2018) +TSRP” algorithm with a number of representative UDA algorithms Pan et al. (2011); Wold et al. (1987); Gong et al. (2012); Long et al. (2014); Si et al. (2009); Xu et al. (2015); Wang et al. (2014); Ding and Fu (2016). Experimental results show that the integrated method behaves better than the comparison methods.

The remainder of this paper is organized as follows. In Section 2, we review some related work. In Section 3, we propose TSRP to improve the accuracy of pseudo labels. The experimental results are reported in Section 4. Finally, we conclude this paper in Section 5.

2 Related work

Early works on UDA aim to align the marginal distributions of the source and target domainsGong et al. (2012); Ganin and Lempitsky (2015). Due to the lack of labeled target samples, even though the marginal distributions are perfectly aligned, there is no guarantee that a good classification result will be produced since that the conditional distribution of the target domain may be misaligned with that of the source domain. To overcome this issue, many UDA methods resort to pseudo labels of the target domain Long et al. (2013); Kang et al. (2019); Zhang et al. (2020)

. If the pseudo labels of the target samples can be properly obtained, then supervised learning can be applied to train a good classifier. There are two strategies to generate the pseudo labels—hard labeling

Long et al. (2013); Zhang et al. (2017); Li et al. (2018) and soft labeling Pei et al. (2018). Because the accuracy of the pseudo labels plays an important role to the quality of the classifier, we summarize some pseudo label generation and selection methods that focus on improving the accuracy of the pseudo labels as follows.

In Saito et al. (2017), Saito et al. use three asymmetric classifiers to improve the accuracy of the pseudo labels, where two of the classifiers are used to select confident pseudo labels, and the third one aims to learn a discriminative data representation for the target domain. Wang and Breckon (2020) explores the structural information of the target domain by structured prediction, and combines the nearest class prototype and structured prediction to promote the accuracy of pseudo labels. Tian et al. (2020) regards the samples of the same cluster in the target domain as a whole rather than individuals. It assigns pseudo labels to the target cluster by class centroid matching. Chen et al. (2019) proposed an easy-to-hard strategy which divides target samples into three categories, namely easy samples, hard samples and incorrect-easy samples. It tends to generate pseudo labels for easy samples and tries to avoid hard samples. The easy-to-hard strategy may be biased to easy classes. To address this issue, a confidence-aware pseudo label selection strategy was proposed in Wang et al. (2019a)

. It selects samples from each class independently by the probability of pseudo labels. In

Chen et al. (2019); Wang et al. (2019a), they use the distances from the target samples to the centers of the source samples as the selection criterion to select highly confident pseudo labels. In Wang and Breckon (2020); Tian et al. (2020), they iteratively generate-confident pseudo labels. However, they do not consider how to correct the falsely generated pseudo labels.


Figure 2: Architecture of the proposed framework with TSRP. It consists of the following four successive steps. First, an unsupervised domain adaptation (UDA) method is used to learn a domain-invariant feature. Then, a weak classifier is trained to obtain the pseudo labels of the target samples. Third, UTSP is proposed to pick the target samples with highly-confident pseudo labels. Finally, a strong classifier is trained with the source domain sample and the target samples with the highly-confident pseudo labels, which is used to remedy the pseudo labels.

Different from the above methods, in this paper, we propose to correct the falsely generated pseudo labels by exploring the intra-class similarity in the target domain.

3 Method

In the following subsections, we give the formulation of the problem and our motivation, and describe the proposed method in detail.

3.1 Framework

A UDA problem is formulated as follows. Given a manually labeled source domain , and an unlabeled target domain , where and represent

-dimensional feature vectors of the samples in the source domain and target domain respectively,

is the manual label in the source domain, and are the number of the source and target samples respectively. The goal of UDA is to predict the labels of . In this paper, we focus on the problem that the source and target domains share the same object classes. Suppose there are classes in both domains.

As summarized in Section 2, many UDA approaches focus on obtaining a good alignment between the source and target domains to generate pseudo labels, leaving the negative effect of the falsely generated pseudo labels unsolved. Because the inaccurate pseudo labels could result in catastrophic error accumulation during the learning process Wang and Breckon (2020), intuitively, if we could increase the accuracy of the pseudo labels, then we may get a better alignment between the source domain and the target domain.

In this paper, we propose to steadily improve the accuracy of the pseudo labels in the target domain by iteratively exploiting the intra-class similarity between the target samples. An shown in Fig. 2. for each iteration, the proposed method runs the following two steps in sequence. First, it employs a traditional UDA method to generate the crude pseudo labels of the target samples for the current iteration given a set of improved pseudo labels from the previous iteration, see Section 3.2 for the details. Then, it uses TSRP to improve the accuracy of the pseudo labels, see Section 3.3 for the details where a key component named UTSP is presented in Section 3.4. The overall framework with the TSRP module is summarized in Algorithm 1.

Input: Labeled source samples, ; Target samples, ; Trust parameter: ;
Output: Remedial pseudo labels of the target domain ;
1 while not converged do
2       Get the projection matrix by a UDA method with and ;
3       ;
4       ;
5       Train the classifier with ;
6       ;
7       // TSRP start:
8      while not converged do
9             ;
10             Train a strong classifier with and ;
11             Get the remedial pseudo labels by (1);
12             ;
13            
14      ;
15       // TSRP end;
16      
Algorithm 1 Proposed framework with TSRP module.
Input: Domain-invariant features , its corresponding pseudo labels which contain pseudo classes;
Output: Highly-confident samples , low-confident samples ;
1 for  do
2       Calculate intra-class similarity matrix by (2);
3       Calculate by (4);
4       Calculate adjacency matrix by (5);
5       Calculate diagonal matrix by (6);
6       Pick the root node of the spanning tree, i.e. , by (7);
7       Get highly-confident and low-confident samples by the spanning tree;
Algorithm 2 UTSP.

3.2 UDA: Generating crude pseudo labels

An existing UDA algorithm is employed to learn a projection matrix that maps the samples from both domains into a shared latent subspace. A requirement is that should be learned in a supervised manner where the remedial pseudo labels of the target samples obtained from the previous iteration of the framework. are treated as the labels. Various advanced UDA algorithms meet this requirement, such as those Li et al. (2018); Long et al. (2013); Fukunaga and Narendra (1975); Wang et al. (2017) employed in the experiments. Then, domain invariant features of the source and target domains are obtained by and . Finally, a classifier is trained with where . It is then used to classify into classes (). The predicted labels of the target samples are denoted as the crude pseudo labels .

3.3 TSRP: Using target domain intra-class similarity to remedy pseudo labels

There may always be some incorrect pseudo labels in in each iteration, especially, in the early training stage, therefore, when learning with the incorrect pseudo labels, the final performance may not be good due to the cumulative errors in the iterative optimization process. If we could correct part of the incorrect pseudo labels in each iteration, then the performance might be improved steadily. To address this issue, a possible way is to pick the pseudo labels with low confidence, and conduct correction to them with a stronger classifier than the original classifier .

Motivated by the above analysis, TSRP aims to improve the accuracy of the crude pseudo labels by remedying the pseudo labels with low confidence. Specifically, TSRP first partitions the target samples into two sets, one with highly-confident pseudo labels, denoted as and the other one with low confident pseudo labels, denoted as , by exploring the intra-class similarity in the target domain (see Section 3.4). Then, it trains a strong classifier with and , and uses the classifier to predict the labels of :

(1)

where is the remedial pseudo labels of the target samples with low confidence. Finally, we take the remedial pseudo labels and the pseudo labels with high confidence together as the target domain pseudo labels to train in the next iteration.

3.4 UTSP: Using target intra-class similarity to pick pseudo labels with high confidence

UTSP aims to select the target samples whose pseudo labels are highly-confident. It consists of two steps—deleting and spanning tree. Its principle is illustrated in Fig. 3. We will present the details of the two steps in the following subsections with a summary in Algorithm 2.


Figure 3: Principle of UTSP. UTSP consists of two steps: (i) Deleting, which deletes the samples with low pairwise similarity scores as shown from Fig. (a) to Fig. (b), and (ii) spanning tree, which selects samples with highly-confident pseudo labels by spanning trees as shown from Fig. (b) to Fig. (c).

3.4.1 Deleting

The deleting step aims to delete the samples with small similarity for each pseudo class. In the following, we present the deleting step for the -th pseudo class, . It first calculates a pairwise intra-class similarity matrix

of the target samples by e.g. cosine similarity:

(2)

where denotes the cosine similarity between the -th sample and -th sample of the -th pseudo class, with denoted as the number of samples in the -th pseudo class, , and denotes a target sample belonging to the -th pseudo class. Because is a symmetric matrix, we only keep the upper triangular matrix of , denoted as . We sort the non-zero elements of in the ascending order:

(3)

where is the number of non-zero elements in .

In order to select highly-confident pseudo labels in each category, the deleting step sets a similarity threshold :

(4)

where is a trust parameter. Then, we can obtain a mask matrix as follows:

(5)

If we regard each sample as a node, and take as the adjacency matrix of the undirected graph that are composed of the nodes, then we could observe commonly that the samples belonging to the same ground-truth category have high pairwise similarity scores in the pseudo class, on the contrary, the samples belonging to different ground-truth categories have low pairwise similarity scores. Here we regard the nodes with no neighbors as the samples with low-confident pseudo labels. We delete these nodes from the undirected graph. The process is illustrated in Fig. 3a to Fig. 3b.

3.4.2 Spanning tree

However, the samples of the -th pseudo class may be mixed with samples from multiple ground-truth categories that are geometrically very similar to each other. If we only select highly-confident pseudo labels by the threshold , some misclassified samples may be selected as highly-confident samples after the deleting step.

To further refine the samples selected by the deleting step, we explore an idea from spanning forests Alpert et al. (1995). Specifically, we first get a diagonal matrix by:

(6)

The identity of the largest element of can be computed as:

(7)

The node with the maximum degree, i.e. , represents the most confident sample in the pseudo class, since that it has the maximum number of neighbors.

Then, we set to the root node of a spanning tree and find its leaf nodes. We regard the samples in the same spanning tree as highly-confident samples of the -th pseudo class, and the samples that are not in the spanning tree as misclassified samples from other ground-truth categories. This process is illustrated in Fig. 3b to Fig. 3c.

By pooling the highly-confident samples throughout all pseudo classes, we finally get the highly-confident set and low-confident set . Note that, in order to avoid a bad solution to class-imbalanced problems where a class with a small number of samples disappears after UTSP, we regard all samples of the pseudo class satisfying as highly-confident samples.

4 Experiments

In this section, we evaluate the performance of the proposed methods. The source code of TSRP is available at https://github.com/02Bigboy/TSRP.

4.1 Datasets and cross-domain tasks

We used common cross-domain datasets Li et al. (2018), including CMU-PIE Sim et al. (2002)

, MNIST

LeCun et al. (1998), and USPS Hull (1994), Office Gong et al. (2012); Long et al. (2013), Caltech256 Griffin et al. (2007), and COIL20 Nene et al. (1996). The datasets are described in Table 1. The domain adaptation tasks on the datasets are described as follows.

CMU-PIE is a large face dataset. It consists of more than 40,000 face images from 68 individuals. The face images vary widely due to the variations of the illumination condition, poses, and expressions. In terms of different pose factors, we chose five subsets as in Li et al. (2018): C05 (left pose), C07 (upward pose), C09 (downward pose), C27 (front pose) and C29 (right pose) to construct the cross-domain classification tasks. Following the experimental setting in Li et al. (2018), we randomly selected two subsets as the source domain and target domain respectively for each cross-domain task, which results in 20 cross-domain tasks in total, e.g. “”, “”, … , “” in the form of “sourcetarget”.

Dataset Type Samples Classes Features
CMU-PIE Face 11554 68 1024
MNIST Digit 2000 10 256
USPS Digit 1800 10 256
AMAZON (A) Object 958 10
CALTECH (C) Object 1123 10
DSLR (D) Object 157 10
WEBCAM (W) Object 295 10
COIL20 Object 1440 20 1024
Table 1: Description of the visual cross-domain datasets in the experiments.

MNIST-USPS consists of two classical hand written digit image datasets—USPS Hull (1994) and MNIST LeCun et al. (1998). To speed up the experimental comparisons as that in Long et al. (2013), we randomly chose 1,800 images from USPS and 2,000 images from MNIST, and rescaled the images to a size of , which forms two cross-domain tasks, i.e. “” and “”.

Office+Caltech dataset is one of the most commonly used datasets for unsupervised domain adaptation. It consists of four domains: Amazon (images downloaded from online merchants), Webcam (low-resolution images by a web camera), DSLR (high-resolution images by a digital SLR camera) and Caltech-256. Here, ten common classes from all four domains were used, which are backpack, bike, calculator, headphone, computer-keyboard, laptop, computer-monitor, computer-mouse, coffee-mug, and video-projector respectively. There are 2,533 images in total with 8 to 151 images per category per domain. We extracted two kinds of features, which are the 800-dim SURF Gong et al. (2012) and 4096-dim DeCAF6 Donahue et al. (2014) respectively. Similar to Li et al. (2018), we obtained 12 cross-domain tasks for two kinds of features, e.g. “”, “” , … , “” and “”, by randomly choosing two domains from the data as the source and target respectively.

COIL20 dataset consists of 1,440 grayscale images with 20 objects. Each object has 72 images of size taken at pose intervals of 5 degrees rotating through 360 degrees. Like Li et al. (2018), we split the dataset into 2 subsets: COIL1 and COIL2. COIL1 includes all images at the directions of to and to . COIL2 contains the images of to and to . They follow different but related distributions since that COIL1 and COIL2 consist of the same objects with diverse shooting degrees. We randomly chose one as the source domain and the other as the target domain for two cross-domain tasks, e.g. “” and “”.

4.2 Experimental settings

TSRP can be used as a term of many UDA algorithms. In order to verify the effectiveness of TSRP, we integrated it into 4 algorithms, namely 1-nearest neighbor (NN) Fukunaga and Narendra (1975)

, joint distribution adaptation (JDA)

Long et al. (2013), balanced distribution adaptation (BDA) Wang et al. (2017), and domain invariant and class discriminative feature learning (DICD) Li et al. (2018). NN is a standard machine learning methods. JDA aligns both marginal distribution and conditional distribution of of the source and target domains. BDA aligns the marginal distribution and conditional distribution with different weights according to different tasks. DICD considers the class discrimination to learn both domain invariant and class discriminative features. We denote the extended methods of the above four UDA algorithms as NN+TSRP, JDA+TSRP, BDA+TSRP, and DICD+TSRP respectively.

Our TSRP approach consists of two hyper-parameters: The trust parameter , and the number of inner iterations for TSRP . We set as a fixed parameter for all experiments. We set = 0.9 for the datasets Office+Caltech-256 (SURF) and CMU-PIE, and = 0.85 for the other datasets. The iteration number for JDA, BDA, DICD was set to

. For a fair comparison, the parameters of the extended methods and the original methods were set the same. The classification accuracy on the target domain was used as the evaluation metric.

4.3 Experimental results

In this section, we compared the four UDA algorithms, i.e. NN, JDA, BDA, and DICD, with their TSRP extensions on the visual cross-domain tasks.

Table 2 shows the classification performance on the CMU-PIE dataset. From the table, we can see that, after incorporating TSRP, the performance of JDA, BDA, and DICD has been significantly improved. To be specific, DICD+TSRP, BDA+TSRP, and JDA+TSRP achieve , , absolute improvement respectively over DICD, BDA, and JDA in terms of average accuracy. It is worthy noting that, compared to DICD, DICD+TSRP obtains the best results on all cross-domain tasks of PIE. BDA+TSRP achieves the best results in 17 tasks out of all 20 tasks, compared to BDA. JDA+TSRP achieves the best results in 15 tasks compared to JDA. In addition, we have also observed that the effect of NN+TSRP is worse than that of NN. It may be caused by that the original data of PIE in the source domain and target domain are quite different. Therefore, when NN is applied to the original data directly, the incorrectly generated pseudo labels may be the majority. Eventually, the majority is further enhanced after TSRP is applied.

Table 3 lists the classification accuracy of the comparison methods on the Office+Caltech-256 (SURF features) dataset. From the table, we see that, after combining with TSRP, all of the four UDA algorithms have been improved. Specifically, among the 12 tasks, DICD+TSRP, BDA+TSRP, JDA+TSRP, and NN+TSRP outperforms their original counterparts in 9, 8, 7, and 6 tasks respectively.

TasksMethods NN NN+TSRP JDA JDA+TSRP BDA BDA+TSRP DICD DICD+TSRP
C05 C07 26.09 23.51 58.81 58.81 72.99
C05 C09 26.59 22.92 54.23 57.11 72.00
C05 C27 30.67 27.31 84.50 81.98 84.50 82.31 92.22
C05 C29 16.67 14.64 49.75 49.94 66.85
C07 C05 24.49 24.10 57.62 57.77 69.93
C07 C09 46.63 41.30 62.93 62.93 65.87
C07 C27 54.07 52.42 75.82 75.52 76.06 85.25
C07 C29 26.53 23.28 39.89 42.03 48.71
C09 C05 21. 37 21.76 50.96 48.08 52.76 51.38 69.36
C09 C07 41.01 34.56 57.95 57.95 65.44
C09 C27 46.53 46.14 68.45 68.88 83.39
C09 C29 26.23 23.65 39.95 42.65 61.40
C27 C05 32.95 30.58 80.58 80.70 93.13
C27 C07 62.68 59.30 82.63 82.01 83.18 90.12
C27 C09 73.22 72.30 87.25 86.52 87.32 87.13 88.97
C27 C29 37.19 33.58 54.66 55.64 75.61
C29 C05 18.49 18.40 46.46 50.99 62.88
C29 C07 24.19 21.42 42.05 45.92 57.03
C29 C09 28.31 27.14 53.31 50.86 53.25 65.87
C29 C27 31.24 31.30 57.01 57.28 74.77
Average accuracy 35.46 32.48 60.24 61.28 73.09
Average improvement -2.98
Relative improvement -3.49
Table 2: Average classification accuracy (%) of the comparison methods on the target domains of the CMU-PIE tasks.
TasksMethods NN NN+TSRP JDA JDA+TSRP BDA BDA+TSRP DICD DICD+TSRP
C A (SURF) 23.70 23.49 44.78 46.45 46.14 48.33 47.29 47.81
C W (SURF) 25.76 24.75 41.69 46.10 41.69 47.46 46.44 50.85
C D (SURF) 25.48 24.84 45.22 49.04 47.13 49.04 49.68 50.96
A C (SURF) 26.00 26.63 39.36 39.63 40.61 39.72 42.39 41.76
A W (SURF) 29.83 30.17 37.97 43.39 40.00 39.72 45.08 49.15
A D (SURF) 25.48 26.75 39.49 31.85 40.13 38.85 38.85 42.04
w C (SURF) 19.86 18.25 31.17 31.52 32.06 33.04 33.57 32.95
w A (SURF) 22.96 21.92 32.78 30.48 32.99 32.15 34.13 31.94
w D (SURF) 59.24 59.87 89.17 89.81 89.17 90.45 89.81 89.81
D C (SURF) 26.27 26.09 31.52 31.43 33.39 33.57 34.64 37.04
D A (SURF) 28.50 29.33 33.09 32.78 33.72 34.03 34.45 35.28
D W (SURF) 63.39 65.08 89.49 88.47 89.49 90.51 91.19 91.19
Average accuracy 31.37 31.43 46.31 46.75 47.21 48.07 48.96 50.06
Average improvement 0.06 0.44 0.86 1.10
Relative improvement 0.09 0.82 1.63 2.16
Table 3: Classification accuracy (%) on the Office+Caltech-256 (surf features) tasks, where A = AMAZON, C = CALTECH, D = DSLR and W = WEBCAM

Tables 4 and 5 list the classification accuracy on MNIST+USPS and COIL20 datasets. From the tables, we observe that JDA+TSRP, BDA+TSRP and DICD+TSRP outperform their original counterparts on all tasks. Particularly, DICD+TSRP achieves a relative improvement of over DICD on COIL20.

TasksMethods NN NN+TSRP JDA JDA+TSRP BDA BDA+TSRP DICD DICD+TSRP
USPS MNIST 35.85 35.30 59.65 62.40 60.05 62.40 61.50 67.55
MNIST USPS 64.44 70.72 67.28 72.44 69.89 74.06 73.28 73.39
Average accuracy 50.15 53.01 63.46 67.42 64.97 68.23 67.39 70.47
Average improvement 2.86 3.96 3.26 3.08
Relative improvement 5.74 10.84 9.30 9.45
Table 4: Classification accuracy (%) on the MNIST+USPS tasks.
TasksMethods NN NN+TSRP JDA JDA+TSRP BDA BDA+TSRP DICD DICD+TSRP
COIL1 COIL2 84.72 88.75 93.75 95.14 93.89 96.53 94.58 95.69
COIL2 COIL1 83.33 83.19 92.64 94.86 93.33 95.56 93.47 98.06
Average accuracy 84.03 85.97 93.19 95.00 93.61 96.04 94.03 96.88
Average improvement 1.94 1.81 2.43 2.85
Relative improvement 12.17 26.53 38.04 47.67
Table 5: Classification accuracy (%) on the COIL20 tasks.

Table 6 shows the results on the Office+Caltech-256 (DECAF6 features) dataset. We see from the table that the average accuracy of NN+TSRP, JDA+TSRP, BDA+TSRP and DICD+TSRP is , , , higher than that of NN, JDA, BDA and DICD respectively. Particularly, BDA+TSRP and DICD+TSRP outperform their original counterparts on all tasks. Both JDA+TSRP and NN+TSRP outperform their counterparts in 11 out of 12 tasks. In order to better show the advantage of TSRP, we further draw the average accuracy of the comparison methods in Fig. 4 in the ascending order. From Fig. 4, we observe an interesting phenomenon that, although DICD considers more information than JDA and BDA, such as conditional distribution alignment and discriminative learning, JDA and BDA can still yield better result than DICD by generating more accurate pseudo labels. This phenomenon indicates that the accuracy of the pseudo labels has an important impact on performance. It also indicates that the accuracy of the pseudo labels can be improved by TSRP. The result also shows that TSRP can help learn better domain invariance features.


Figure 4: Performance summary of the domain adaptation baselines and their TSRP extensions on Office+Caltech-256 in terms of average classification accuracy.
TasksMethods NN NN+TSRP JDA JDA+TSRP BDA BDA+TSRP DICD DICD+TSRP
C A () 85.70 89.77 90.61 91.02
C W () 66.10 83.73 84.07 92.20
C D () 74.52 86.62 87.90 93.63
A C () 70.35 82.28 83.17 86.02
A W () 57.29 78.64 78.64 81.36
A D () 64.97 80.25 84.71 83.44
W C () 60.37 83.53 82.90 83.53 83.97
W A () 62.53 90.19 90.50 89.67
W D () 98.73 100.00 100.00 100.00
D C () 52.09 49.24 85.13 85.22 86.11
D A () 62.73 91.44 91.54 92.17
D W () 89.15 98.98 98.98 98.98
Average accuracy 70.38 87.55 88.24 89.88
Average improvement
Relative improvement
Table 6: Classification accuracy (%) on the Office+Caltech-256 (decaf6 features) tasks, where A = AMAZON, C = CALTECH, D = DSLR AND W = WEBCAM

4.4 Analysis

The results in Tables 2 to 6 illustrate the effectiveness and generalization of TSRP, which in turn demonstrates that exploring the intra-class similarity of the target domain can remedy pseudo labels well. In this subsection, we conducted several analytical experiments to further verify the effectiveness of TSRP.

4.4.1 Effect of UTSP on performance

To demonstrate the effectiveness of UTSP in picking the pseudo labels with high confidence, we compared the performance of TSRP with the proposed UTSP and a variant of UTSP that does not use the spanning tree, denoted as UTSP_N. The result is shown in Fig. 5. From the figure, we can see that the TSRP without the spanning tree can still lead to a small improvement, while adding the spanning tree into TSRP can produce significantly better results. Because the main job of UTSP is to select highly confident pseudo labels, the improved performance not only supports the effectiveness of UTSP, but also reflects the effectiveness of the selected highly confident pseudo labels in promoting the alignment of the domains.

(a) Validation with JDA
(b) Validation with BDA
(c) Validation with DICD
Figure 5: Effect of UTSP on the performance of the Office+Caltech-256 datasets with respect to the optimization iterations. Different colors represent different domain adaptation tasks. The term “TSRP_N” denotes that UTSP does not contain the spanning tree step.

4.4.2 Effect of the highly confident pseudo labels on the classifier

(a) Validation with JDA
(b) Validation with BDA
(c) Validation with DICD
Figure 6: Effect of the strong classifier in TSRP on the performance of the Office+Caltech-256 datasets with respect to the optimization iterations. The term “TSRP_C” means that TSRP adopts the original classifier trained with the source data only, instead of the strong classifier.
TasksMethods PCA GFK TCA TJM DTSL CDML RTML DICD DICD+TSRP
Average accuracy
Table 7: Accuracy comparison of the proposed DICD+TSRP with representative UDA methods on the Office+Caltech-256 (Decaf6 features) dataset, where A = Amazon, C = Caltech, D = DSLR, and W = WEBCAM. The best result is marked in bold. The runner-up result is underlined.
(a) JDA
(b) JDA+TSRP
(c) DICD
(d) DICD+TSRP
Figure 7: Visualization of the features produced by the comparison methods. Different colors represent different categories.

Figure 8:

Effect of the hyperparameters on five domain adaptation tasks. Different colors represent different domain adaptation tasks.

To demonstrate the effectiveness of the selected highly confident pseudo labels in improving the performance of the classifier, we compared the classifier (denoted as “strong classifier”) with the one that is trained with the source samples (denoted as “original classifier”) only. The result is shown in Fig. 6. From the figure, we see that the performance of the strong classifier is much better than the original classifier. It indicates that TSRP can help learn good domain-invariant and discriminative features, and in turn, the features can help TSRP refine the pseudo labels, which is a co-promotion process.

From Fig. 6, we can also find that, no matter how many iterations are conducted, the baseline methods without TSRP are upper-bounded due to the low accuracy of the pseudo labels. On the other side, the proposed methods with TSRP can break through such limit.

4.4.3 Effects of hyper-parameters on performance

Our approach consists of two hyper-parameters: The trust parameter and the number of inner iterations . Here we take DICD+TSRP as an example to study how the two hyperparameters affect the performance. The experiment was conducted on the tasks , (SURF), (DeCAF6), and . The results are reported in Fig. 8. Specifically, Fig. 8a shows the effect of when was set to and was selected from . From the figure, we see obviously that our approach is not sensitive to . Figure 8b shows the effect of when and was selected from . The result indicates that, although the the proposed method is relatively sensitive to the trust parameter for each single task, it reaches the best performance on all tasks when .

4.4.4 Visualization

To demonstrate how the proposed method improves the alignment of the source and target domains, we visualize the projected representations of JDA, JDA+TSRP, DICD and DICD+TSRP on the task (DeCAF6 features) in Fig. 7. Comparing Figs. 7a and 7b, we can see that TSRP+JDA has a better aligned conditional distribution than JDA. Comparing Figs. 7c and 7d, we observe similar phenomenon, which supports our claim on the advantage of TSRP.

4.5 Comparison with other domain adaptation methods

To further illustrate the effectiveness of TSRP, we compared DICD+TSRP with several other UDA methods, which are GFK Gong et al. (2012), TCA Pan et al. (2011), TJM Long et al. (2014), TSL Si et al. (2009), DTSL Xu et al. (2015), CDML Wang et al. (2014), RTML Ding and Fu (2016), and DICD Li et al. (2018). We also compared with a standard machine learning methods, i.e., PCA Wold et al. (1987). In order to make a fair comparison, the results of the comparison methods are from their public codes or the original papers.

Table 7 shows the classification accuracy of the comparison methods on the Office+Caltech-256 (DECAF6 features) dataset. From the table, we can see that DICD+TSRP achieves the best performance in 10 out of 12 tasks, and ranks the second in the other 2 tasks. The results on the other datasets are listed in the supplementary material. The experimental conclusion is similar to that on the Office+Caltech-256 (DECAF6 features) dataset.

5 Conclusion

In this paper, we have proposed to use target domain intra-class similarity to remedy pseudo labels (TSRP) for improving the accuracy of the coarse pseudo labels that are generated from a conventional UDA method, which in turn improves the discriminant ability of the learned representation of the UDA. Specifically, TSRP first exploits the intra-class similarity and spanning trees to pick samples with high confident pseudo labels. Then, it trains a strong classifier with both the source samples and the target samples whose pseudo labels are highly-confident. Finally, it uses the strong classifier to remedy the pseudo labels of the target samples with low-confident pseudo labels. Experimental results on extensive visual cross-domain tasks have shown that applying TSRP to conventional UDA methods can improve the accuracy of the pseudo labels and further lead to more discriminative and domain invariant features than the conventional UDA baselines.

References

References

  • C. J. Alpert, T. C. Hu, J. Huang, A. B. Kahng, and D. Karger (1995) Prim-dijkstra tradeoffs for improved performance-driven routing tree design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 14 (7), pp. 890–896. Cited by: §1, §3.4.2.
  • C. Chen, W. Xie, W. Huang, Y. Rong, X. Ding, Y. Huang, T. Xu, and J. Huang (2019) Progressive feature alignment for unsupervised domain adaptation. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    ,
    pp. 627–636. Cited by: §2.
  • Z. Ding and Y. Fu (2016) Robust transfer metric learning for image classification. IEEE Transactions on Image Processing 26 (2), pp. 660–670. Cited by: item 3, §4.5.
  • J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In International conference on machine learning, pp. 647–655. Cited by: §4.1.
  • B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars (2013) Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Cited by: §1.
  • K. Fukunaga and P. M. Narendra (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE transactions on computers 100 (7), pp. 750–753. Cited by: item 3, §3.2, §4.2.
  • Y. Ganin and V. Lempitsky (2015)

    Unsupervised domain adaptation by backpropagation

    .
    In International conference on machine learning, pp. 1180–1189. Cited by: §2.
  • B. Gong, Y. Shi, F. Sha, and K. Grauman (2012) Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE conference on computer vision and pattern recognition, pp. 2066–2073. Cited by: item 3, §2, §4.1, §4.1, §4.5.
  • A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola (2012) A kernel two-sample test. The Journal of Machine Learning Research 13 (1), pp. 723–773. Cited by: §1.
  • A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola (2006) A kernel method for the two-sample-problem. Advances in neural information processing systems 19, pp. 513–520. Cited by: §1.
  • G. Griffin, A. Holub, and P. Perona (2007) Caltech-256 object category dataset. Cited by: §4.1.
  • X. He and P. Niyogi (2003) Locality preserving projections. Advances in neural information processing systems 16. Cited by: §1.
  • J. J. Hull (1994) A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence 16 (5), pp. 550–554. Cited by: §4.1, §4.1.
  • G. Kang, L. Jiang, Y. Yang, and A. G. Hauptmann (2019) Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4893–4902. Cited by: §2.
  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.1, §4.1.
  • J. Li, M. Jing, K. Lu, L. Zhu, and H. T. Shen (2019a) Locality preserving joint transfer for domain adaptation. IEEE Transactions on Image Processing 28 (12), pp. 6103–6115. Cited by: §1, §1.
  • J. Li, K. Lu, Z. Huang, L. Zhu, and H. T. Shen (2019b) Heterogeneous domain adaptation through progressive alignment.

    IEEE transactions on neural networks and learning systems

    30 (5), pp. 1381–1391.
    Cited by: §1.
  • S. Li, S. Song, G. Huang, Z. Ding, and C. Wu (2018) Domain invariant and class discriminative feature learning for visual domain adaptation. IEEE Transactions on Image Processing 27 (9), pp. 4260–4273. Cited by: item 3, §1, §2, §3.2, §4.1, §4.1, §4.1, §4.1, §4.2, §4.5.
  • J. Liang, R. He, Z. Sun, and T. Tan (2019) Aggregating randomized clustering-promoting invariant projections for domain adaptation. IEEE transactions on pattern analysis and machine intelligence 41 (5), pp. 1027–1042. Cited by: §1.
  • J. Liang, D. Hu, and J. Feng (2021) Domain adaptation with auxiliary target domain-oriented classifier. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16632–16642. Cited by: §1.
  • M. Long, Y. Cao, J. Wang, and M. Jordan (2015) Learning transferable features with deep adaptation networks. In International conference on machine learning, pp. 97–105. Cited by: §1.
  • M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu (2013) Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE international conference on computer vision, pp. 2200–2207. Cited by: item 3, §1, §2, §3.2, §4.1, §4.1, §4.2.
  • M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu (2014) Transfer joint matching for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1410–1417. Cited by: item 3, §4.5.
  • Y. Luo, C. Ren, P. Ge, K. Huang, and Y. Yu (2020) Unsupervised domain adaptation via discriminative manifold embedding and alignment. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Vol. 34, pp. 5029–5036. Cited by: §1.
  • S. A. Nene, S. K. Nayar, H. Murase, et al. (1996) Columbia object image library (coil-100). Cited by: §4.1.
  • S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang (2011) Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22 (2), pp. 199–210. External Links: Document Cited by: item 3, §1, §1, §4.5.
  • S. J. Pan and Q. Yang (2010)

    A survey on transfer learning

    .
    IEEE Transactions on Knowledge and Data Engineering 22 (10), pp. 1345–1359. External Links: Document Cited by: §1.
  • Z. Pei, Z. Cao, M. Long, and J. Wang (2018) Multi-adversarial domain adaptation. In Thirty-second AAAI conference on artificial intelligence, Cited by: §2.
  • K. Saito, Y. Ushiku, and T. Harada (2017) Asymmetric tri-training for unsupervised domain adaptation. In International Conference on Machine Learning, pp. 2988–2997. Cited by: §2.
  • S. Si, D. Tao, and B. Geng (2009) Bregman divergence-based regularization for transfer subspace learning. IEEE Transactions on Knowledge and Data Engineering 22 (7), pp. 929–942. Cited by: item 3, §4.5.
  • T. Sim, S. Baker, and M. Bsat (2002) The cmu pose, illumination, and expression (pie) database. In Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp. 53–58. Cited by: §4.1.
  • L. Tian, Y. Tang, L. Hu, Z. Ren, and W. Zhang (2020) Domain adaptation by class centroid matching and local manifold self-learning. arXiv preprint arXiv:2003.09391. Cited by: §2.
  • H. Wang, W. Wang, C. Zhang, and F. Xu (2014) Cross-domain metric learning based on information theory. In Twenty-eighth AAAI conference on artificial intelligence, Cited by: item 3, §4.5.
  • J. Wang, Y. Chen, S. Hao, W. Feng, and Z. Shen (2017) Balanced distribution adaptation for transfer learning. In 2017 IEEE international conference on data mining (ICDM), pp. 1129–1134. Cited by: item 3, §3.2, §4.2.
  • Q. Wang and T. P. Breckon (2022) Cross-domain structure preserving projection for heterogeneous domain adaptation. Pattern Recognition 123, pp. 108362. Cited by: §1.
  • Q. Wang and T. Breckon (2020) Unsupervised domain adaptation via structured prediction based selective pseudo-labeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 6243–6250. Cited by: §1, §2, §3.1.
  • Q. Wang, P. Bu, and T. P. Breckon (2019a) Unifying unsupervised domain adaptation and zero-shot visual recognition. In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §2.
  • W. Wang, H. Li, Z. Ding, F. Nie, J. Chen, X. Dong, and Z. Wang (2021) Rethinking maximum mean discrepancy for visual domain adaptation. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §1.
  • W. Wang, H. Wang, Z. Zhang, C. Zhang, and Y. Gao (2019b) Semi-supervised domain adaptation via fredholm integral based kernel methods. Pattern Recognition 85, pp. 185–197. External Links: ISSN 0031-3203, Document, Link Cited by: §1.
  • S. Wold, K. Esbensen, and P. Geladi (1987) Principal component analysis. Chemometrics and intelligent laboratory systems 2 (1-3), pp. 37–52. Cited by: item 3, §4.5.
  • N. Xiao and L. Zhang (2021) Dynamic weighted learning for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15242–15251. Cited by: §1.
  • Y. Xu, X. Fang, J. Wu, X. Li, and D. Zhang (2015) Discriminative transfer subspace learning via low-rank and sparse representation. IEEE Transactions on Image Processing 25 (2), pp. 850–863. Cited by: item 3, §4.5.
  • J. Zhang, W. Li, and P. Ogunbona (2017) Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1859–1867. Cited by: §2.
  • Y. Zhang, Y. Zhang, Y. Wei, K. Bai, Y. Song, and Q. Yang (2020) Fisher deep domain adaptation. In Proceedings of the 2020 SIAM International Conference on Data Mining, pp. 469–477. Cited by: §2.
  • Y. Zhu, F. Zhuang, J. Wang, G. Ke, J. Chen, J. Bian, H. Xiong, and Q. He (2020) Deep subdomain adaptation network for image classification. IEEE transactions on neural networks and learning systems 32 (4), pp. 1713–1722. Cited by: §1.
  • F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He (2020) A comprehensive survey on transfer learning. Proceedings of the IEEE 109 (1), pp. 43–76. Cited by: §1.