Log In Sign Up

Sparsely-Labeled Source Assisted Domain Adaptation

Domain Adaptation (DA) aims to generalize the classifier learned from the source domain to the target domain. Existing DA methods usually assume that rich labels could be available in the source domain. However, there are usually a large number of unlabeled data but only a few labeled data in the source domain, and how to transfer knowledge from this sparsely-labeled source domain to the target domain is still a challenge, which greatly limits their application in the wild. This paper proposes a novel Sparsely-Labeled Source Assisted Domain Adaptation (SLSA-DA) algorithm to address the challenge with limited labeled source domain samples. Specifically, due to the label scarcity problem, the projected clustering is conducted on both the source and target domains, so that the discriminative structures of data could be leveraged elegantly. Then the label propagation is adopted to propagate the labels from those limited labeled source samples to the whole unlabeled data progressively, so that the cluster labels are revealed correctly. Finally, we jointly align the marginal and conditional distributions to mitigate the cross-domain mismatch problem, and optimize those three procedures iteratively. However, it is nontrivial to incorporate those three procedures into a unified optimization framework seamlessly since some variables to be optimized are implicitly involved in their formulas, thus they could not promote to each other. Remarkably, we prove that the projected clustering and conditional distribution alignment could be reformulated as different expressions, thus the implicit variables are revealed in different optimization steps. As such, the variables related to those three quantities could be optimized in a unified optimization framework and facilitate to each other, to improve the recognition performance obviously.


page 2

page 12

page 16

page 17


Divergence Optimization for Noisy Universal Domain Adaptation

Universal domain adaptation (UniDA) has been proposed to transfer knowle...

Learning Unbiased Transferability for Domain Adaptation by Uncertainty Modeling

Domain adaptation (DA) aims to transfer knowledge learned from a labeled...

Source-Free Domain Adaptation via Distribution Estimation

Domain Adaptation aims to transfer the knowledge learned from a labeled ...

Deep Domain Adaptation under Deep Label Scarcity

The goal behind Domain Adaptation (DA) is to leverage the labeled exampl...

Adversarial Bi-Regressor Network for Domain Adaptive Regression

Domain adaptation (DA) aims to transfer the knowledge of a well-labeled ...

Pareto Domain Adaptation

Domain adaptation (DA) attempts to transfer the knowledge from a labeled...

1 Introduction

Figure 1: The sparsely-labeled source assisted domain adaptation problem, where the samples in red bounding boxes are labeled source, and the others are unlabeled.

Domain Adaptation (DA) has received much attention in recent years as it offers possibility to generalize the classifier trained on one domain to another domain, where the observed data sampled from those two domains are usually coming from different distributions Pan and Yang (2010). For example, in visual recognition, the data instances of those two domains are usually originated from different environments, sensor types, resolutions, and view angles, so that they would follow very discrepant distributions Li et al. (2018). It is impractical to annotate sufficient data for each domain since labeling data is labor intensive and expensive. Therefore, it is necessary to apply the DA techniques to exploit invariant features across different domains, so that well-labeled source knowledge could be transferred to target domain, then the labeling consumption is mitigated. Recently, DA has made remarkable progress in cross-domain hyperspectral image classification Deng et al. (2018), human action recognition Zhang and Hu (2019), etc.

However, the performances of traditional DA usually significantly rely on label quality or richness in the source domain, which is restricted in applications in the wild as we still have to seek a better-labeled and higher-quality source domain Shu et al. (2019); Tan et al. (2019)

. Label quality and its sufficiency are both important in context of domain adaptation (DA), especially for deep learning DA frameworks 

Chen et al. (2020). In some real applications, it may not be easy for users to label data samples as correctly and sufficiently as possible, since they often struggle with a very complicated and large dataset. For example, there exist some data points, which are ambiguous between different categories, or require high level professional technologies. Therefore, those data samples are easily partial wrongly-labeled when they commit to annotate the whole dataset, or highly sparsely-labeled when they only label a handful of samples to reduce the labeling consumption as much as possible. Moreover, due to the dataset is large, it is more challenging to guarantee the label quality or its richness, especially the deep learning DA frameworks which often requires vast amounts of source domain data. Therefore, the provided poor labeled dataset, to be regarded as source domain, have a great impact on training processes of DA models, since incorrect or unknown knowledge of source domain will cause unexpectantly and heavily negative transfer Pan and Yang (2010). Therefore, it is essential to study the situation when the given source domain is poor-labeled, either partial wrongly-labeled (label quality) or highly sparsely-labeled (label richness).

In practical applications, it may not be possible to access a significant amount of labeled data, especially with the dramatic increase of data in deep learning models. Therefore, it is essential to boost positive transfer for a newly unlabeled target domain using a poor-labeled source domain. Many examples in knowledge engineering could be found where this situation can truly be beneficial. One example is the problem of sentiment classification, where our task is to automatically classify the reviews on a product or scores on a visual image. For this classification task, we need to first collect many products or visual images and annotate them using the given reviews or scores. However, labeling them is very labor-intensive and mind-numbing for users, since some data samples are too ambiguous and have no significant divergences between various categories, especially the scores on visual images. Therefore, the label quality or richness is poor when we commit to label them all or only annotate a handful of them to reduce labeling consumption.

After that, we would utilize this poor dataset, to be regarded as the source domain, to train a classifier. Since the distribution of data among different types of products or visual images can be very different, to maintain good classification performance, we need to recollect the source domains in order to train the review/score-classification models for each kind of products/visual images. However, this data-labeling process can be also very expensive to do. To further reduce the effort for annotating reviews/scores for various products/visual images, we may want to adapt a classification model that is trained on some products/visual images (poor-labeled), which could be directly applied to make prediction for other types of products/visual images. As another example, we can consider the data information of users collected from different supermarkets (e.g., Walmart and Amazon, etc.), which is updated every day and often in large quantities, thus it may be also impossible to guarantee the label quality and its richness. Therefore, it is very challenging for us to utilize the poor-labeled data information of users collected from a supermarket (e.g., Walmart) to exploit the interest of users from another different supermarket (e.g., Amazon). In such cases, the proposed sparsely-labeled source assisted domain adaptation can save a significant amount of labeling effort.

To this end, Weakly-Supervised Domain Adaptation (WSDA) is proposed to address the challenge that the source domain contains noises in labels, features, or both Shu et al. (2019). However, they only focus on the label quality problem, and do not further explore that the source labels are insufficient severely. A more realistic setting, Sparsely-Labeled Source Assisted Domain Adaptation (SLSA-DA), is therefore proposed in this paper, to further mitigate the labeling consumption, where only a sparsely-labeled source domain is available without any target labels. Notably, this paper assumes that the target domain is completely unlabeled to increase the difficulty of our work since previous DA work indicates that the unsupervised DA Liang et al. (2019); Yang et al. (2018) is more challenging than the semi-supervised one Wang et al. (2019a); Pereira and Torres (2018). Moreover, we aim to enable the proposed model more general since it is a special and more simple case when there exists at least one example of the class in the target domain. For example, on the Office-Home dataset, source labels are available correctly in WSDA setting, while only in SLSA-DA scenario. As shown in Fig. 1, there are numerous unlabeled data but a few labeled ones, then we need to utilize this sparsely-labeled source domain to assist recognition for target domain.

In order to address the challenge of SLSA-DA, our aim is not only to fight off the label insufficiency issues in the source domain, but also to mitigate the domain shift across the source and target domains. It is essential for DA to study this new SLSA-DA scenario, which could implement knowledge transfer with lowest labeling cost than most existing approaches. Specifically, SLSA-DA introduces two challenges. (1) It is still significant to alleviate the influence of distributional shift across different domains as presented in previous DA methods. (2) Moreover, it is nontrivial to train a well-structured classifier since only limited source labels are available.

Due to the label scarcity problem in SLSA-DA setting, we carry the semi-supervised projected clustering on the source domain using a few labeled source instances, while unsupervised on the target domain, so that the discriminative structures of data could be discovered desirably, i.e., data samples from the same cluster are assembled tightly (i.e., Fig. 2 (a)). Although the cluster labels of source domain can be consistent with ground-truth labels, it is uncertain in the target domain since no supervised information is provided. Therefore, the label propagation method Nie et al. (2009) is adopted to propagate the source limited labels to the source and target unlabeled instances simultaneously, so that the target cluster labels are revealed as correctly as possible (i.e., Fig. 2 (b)). Once their labels are uncovered, we can jointly align the marginal and conditional distributions across different domains using the methods of Maximum Mean Discrepancy (MMD) Gretton et al. (2006) and class-wise MMD Long et al. (2013) (i.e., Fig. 2 (c)). In order to refine the final recognition performance progressively and enable different steps facilitate to each other, we iteratively conduct those three procedures in a few times.

However, it is nontrivial to integrate the projected clustering, label propagation and distributional alignment as a unified optimization framework, since some variables to be optimized are implicitly involved in their formulas, thus they could not promote to each other. To be specific, the construction of class-wise MMD implicitly contains the variables related to cluster centroids, but those variables in the projected clustering should be implicit when we optimize the projection matrix. Existing DA models are usually formulated with the label prediction and distributional alignment, and separate them as different steps Ding et al. (2018). Therefore, they will fail to take advantage of each other’s merits and promote to each other. In contrast, this paper further considers the projected clustering so that the model is robust to the label scarcity problem as we respect the discriminative structures of data. Moreover, we prove that the class-wise MMD could be rewritten as the cluster-wise MMD when we optimize the variables related to cluster centroids, while the projected clustering could be reformulated as the intra-class scatter minimization Wang et al. (2014) when we optimize the shared projection matrix. Therefore, we could couple those three quantities together and benefit them to each other in an effective optimization manner.

The main contributions of our work are two-folds:

  • We first introduce a new DA scenario, called Sparsely-Labeled Source Assisted Domain Adaptation, which is more realistic as it requires a few labeled source data while is under insufficient exploration so far.

  • We propose a unified framework to jointly seek cluster centroids, source and target labels, and domain-invariant features. Then, we construct an optimization strategy to solve the objective function efficiently.

The rest of the paper is organized as follows. The related works are reviewed in Section 2. In Section 3, we propose the model and SLSA-DA algorithm. The experimental evaluations are discussed in Section 4. Finally, we conclude this paper in Section 5.

Figure 2: The overview of the proposed approach.

2 Related Work

Traditional DA aims to employ previous labeled source domain data to boost the task in the target domain. However, they usually assume the source and target domains share an identical label space, known as Closed Set Domain Adaptation (CSDA). Recently, an increasing number of new domain adaptation scenarios have been proposed to compensate for different challenges in practical application, such as Partial Domain Adaptation (PDA), Open Set Domain Adaptation (OSDA), and Universal Domain Adaptation (UDA). PDA transferred a learner from a big source domain to a small target domain, and the label set of the source domain is supposed to be large enough to contain the target label set Cao et al. (2019). By contrast, OSDA was proposed to deal with the challenge that the target domain contains unknown classes, which are not observed in the source domain Baktashmotlagh et al. (2019). Furthermore, for a given source label set and a target label set, UDA required no prior knowledge on the label sets, where they may contain a common label set and hold a private label set respectively You et al. (2019).

All of aforementioned works have shown great improvements in the performance of knowledge transfer due to the available substantial amount of high-quality labeled data in the source domain. Therefore, recent research has set about following with interest the weakly supervised DA scenario. For instance, Tan et al.  Tan et al. (2019) proposed a Collaborative Distribution Alignment (CDA) method for a Weakly Supervised Open-Set Domain Adaptation (WSOSDA), where both domains are partially labeled and not all classes are shared between these two domains. In contrast, Long et al.  Shu et al. (2019) proposed a Transferable Curriculum Learning (TCL) approach to address the challenge of sample noises of the source domain in a Weakly Supervised Close-Set Domain Adaptation (WSCSDA). However, their settings still require enough labeled instances either in the source or target domains. In order to further mitigate the intensive labeling expenses, we propose a more realistic DA paradigm, called Sparsely-Labeled Source Assisted Domain Adaptation, which requires only a few source labels and a satisfactory performance could be warranted through a proposed unified framework. In order to highlight the contributions in this paper and make the model simpler, SLSA-DA assumes the source and target label sets are the same, and the source labeled instances are sparsely located in each class. To the best of our knowledge, our work is the first attempt to deal with this sparsely-labeled WSCSDA scenario.

Recent DA methods follow a mainstream approach which is based on the feature adaptation (FDA). FDA aims to extract a shared subspace, where the distributions of the source and target data are drawn close by explicitly minimizing some predefined distance metrics, e.g., Bregman Divergence Si et al. (2010), Geodestic Distance Gong et al. (2012), Wasserstein Distance Shen et al. (2018) and Maximum Mean Discrepancy (MMD) Gretton et al. (2006). The most popular distance is MMD due to its simplicity and solid theoretical foundations Zhao et al. (2018). Pan  et al.  Pan et al. (2011) proposed the Transfer Component Analysis (TCA) to align the marginal distribution across domains using MMD. Long  et al.  Long et al. (2013) proposed class-wise MMD to further reduce the conditional distribution difference between the two domains. Furthermore, SCA Ghifary et al. (2017), JGSA Zhang et al. (2017), VDA Tahmoresnezhad and Hashemi (2017) constructed the class scatter matrix of source domain to preserve its discriminative information. This paper also utilizes the MMD and class-wise MMD to jointly align the marginal and conditional distributions across the source and target domains. Moreover, we prove that the projected clustering process is equivalent to boost the intra-class compactness when the projection is optimized. Therefore, the learned features from the proposed model are domain-invariant and discriminative, simultaneously.

It is noteworthy that the methods mentioned above require a strong assumption that rich labels are available in the source domain. Moreover, they optimize the target labels in a separate step along with the domain-invariant feature learning, thus they may fail to benefit to each other in an effective manner Ding et al. (2018). Different from them, this paper incorporates the projected clustering, label propagation and distributional alignment into a unified optimization framework seamlessly, and jointly optimize cluster centroids, source and target labels and domain-invariant features, where only a few source labels are available.

3 Methodology

In this section, we present our proposed model and its optimization strategy in detail.

3.1 Problem Definition

We begin with the definitions of terminologies. (resp. ) denotes the source (resp. target) domain data, where (resp. ) is the number of samples and m is the dimension of data instance. In the proposed SLSA-DA setting, there are a few source labels while no target labels at all, i.e., , , where and the one-hot label ( is the number of classes).

Moreover, we assume that the source and target domains follow the same feature space and label space, while the marginal and conditional distributions are different due to the dataset shift. Our aim is to find a projection to map and into a shared subspace, where those two distributional differences could be explicitly reduced. Then their new representations are .

3.2 Projected Clustering

The projected clustering aims to jointly optimize the cluster centroids and cluster labels in an embedding space, so that the data instances from the same clusters could be grouped together Wang et al. (2014). Since only limited source labels are available in SLSA-DA scenario, we propose to utilize a semi-supervised projected clustering in the source domain, while unsupervised setting in the target domain. Therefore, the discriminative structures of data could be exploited with these limited source labels. The loss of projected clustering is defined as follows:


where , are the one-hot cluster labels for the source and target domains, respectively. According to Wang et al. (2014), the source and target cluster centroids could be computed as , , where . Eq.(1) means that each data point could be reconstructed by all cluster centroids and its cluster label. In addition, we enforce the clustering results of source labeled data are consistent with their initial labels .

3.3 Effective Label Propagation

Although the source cluster labels represent the true labels in the semi-supervised setting, it is uncertain in the target domain since no supervised information provided. To address this issue, a method of graph-based label propagation (GLP) Nie et al. (2009) is introduced to guide the clustering procedure on target domain, so that their predictive cluster labels are in agreement with the true labels as accurately as possible. Specifically, we propagate the labels from the labeled source data to the unlabeled source and target data, and the loss of label propagation is defined as follows:


where and represents the graph Laplacian matrix. Meanwhile, D denotes a diagonal matrix with the diagonal entries as the column sums of W. Specifically,


3.4 Cross-Domain Feature Alignment

In order to align the domain-wise distributions between the source and target domains, the MMD is adopted to explicitly reduce their marginal distribution difference, and its loss is defined as follows:


where is the MMD matrix, and it is computed as follows:


We further decrease the conditional distribution shift across domains by class-wise MMD, and the formula is as follows:


where and are the numbers of data samples from class in the source and target domains (, , ), then the class-wise is computed as follows:


3.5 Overall Objective Function

Finally, we formulate the proposed model by incorporating the above Eq.(2), Eq.(4), Eq.(6) as follows:


where , are trade-off parameters, and we constrain the subspace with such that the data on the subspace are statistically uncorrelated (

is the identity matrix and the data matrix

X is pre-centralized). We further impose the constraint that is small to control the scale of A Zhang et al. (2017).

Remarkably, the proposed approach joints projected clustering, label propagation and distributional alignment in a unified framework. Thus, it could benefit to each other to improve the recognition for the unlabeled data in both domains.

With the projected clustering, the discriminative structures of data could be exploited effectively (i.e., the data points belonging to the same cluster could be congregated together), where only a few source labels are required. With label propagation, the cluster labels of unlabeled data are revealed correctly, either in the source or target domains. The domain-invariant features mean that the feature representations of those data instances, with the same semantic (i.e., category) from different domains, are as similar as possible, while the reasons that domain-invariant features have poor performance is that different domains follow very different distributions (i.e., domain shift). Therefore, with domain shift mitigated, the domain-invariant features could be leveraged effectively. Moreover, when they are jointly optimized, the discriminative and domain-invariant features prompt a more effective graph between the source and target domains, so that a few source labels could be propagated to the unlabeled data more accurately. Meanwhile, when more accurate labels are assigned to the unlabeled data, more effective knowledge across two domains would be transferred, and more promising projected clustering performance in both domains would be achieved. As such, those three procedures could promote to each other in a unified optimization framework and the proposed approach is more robust and effective than considering them separately.

However, there exist two difficulties when Eq.(8) is optimized. Firstly, the term contains label information, thus we have to rewrite it as a formulation, where the variable F is involved as the cluster centroids are optimized. As mentioned before, the source and target cluster centroids in the embedded space could be computed as . Remarkably, is nothing less than the sum of mean distances between the source and target embedded data from the same classes. Therefore, it is easily to verify that , where , which means that the conditional distribution alignment equals to cluster centroids calibration. Therefore, we expect that the learned cluster centroids not only enable the embedded data points more separable and discriminative, but also boost their conditional distribution alignment when the cluster centroids are optimized.

Another challenge is that how to enable the form of agree with and when the shared projection is optimized. We also prove that , where , are the intra-class scatter matrix for the source and target domains, and could be computed as previous work Wang et al. (2014). Similarly, we expect that not only the marginal and conditional distributions of source and target are aligned, but also their discriminative information could be respected when the shared projection is optimized. Therefore, the projected clustering, label propagation and distributional alignment could be optimized simultaneously, and facilitate to each other.

Theorem 1. The projected clustering process can be rewritten as the class scatter matrix:


where , are the intra-class scatter matrice for the source and target domains.

Proof: Without loss generality, we prove that . Firstly, we denote , where represents the mean of from class . As mentioned before, . Then, we have:




Thus, the Eq.(9) is proved.

3.6 Optimization

Here a alternative optimization strategy is constructed to solve Eq.(8) as below. We first transform it into the augmented Lagrangian function by relaxing the non-negative constraint as follows:


where , , , are the Lagrange multipliers for constraints . When are fixed, Eq.(12) becomes:


where , and , are the intra-class scatter matrix for the source and target domains and could be computed as previous work Wang et al. (2014). Here we rewrite the cluster projection as class scatter matrix since the labels are uncovered when A optimized. Then, the optimal solution A to Eq.(13) is formed by the eigenvectors of corresponding to the

smallest eigenvalues.

When are fixed, Eq.(12) becomes:


where we rewrite the distributional alignment as cluster centroids calibration since the labels are unknown. Thus, we obtain the partial derivative of J w.r.t., , by setting it to zero as:


Using the KKT conditions ( denotes the dot product of two matrix), we achieve the following equations for :


Following Ding et al. (2018, 2010), we obtain the updating rule:


where , , , , . Moreover, is a matrix that the negative elements of an arbitrary matrix T are replaced by 0. Similarly, is a matrix that the positive elements of an arbitrary matrix T are replaced by 0. Similarly,


where , , , , .

As for , we fix and Eq.(12) becomes:


Likewise, we obtain the following equations for :


Therefore, the updating rule for is as follows:


where , . Similarly, the updating rule for is as follows:


where , .

To make these update rules clear, we summarize the algorithm to solve Eq.(12) in Algorithm 1.

Input: Sparsely labeled source data and the limited source labels, , .
    Unlabeled target data . Regularized parameters k, , ,
Output: Labels of source and target unlabeled data (i.e., and ) for and
Line 1: Compute by Eq.(5)
Line 2: Obtain A by Eq.(13) with and
Line 3: Obtain ,
Line 4: Propagate labels from to , i.e.,
Line 5: Propagate labels from to , i.e.,
For =1 to do
Line 6: Compute , , by Eq.(10) and Eq.(6)
Line 7: Update and by Eq.(13)
Line 8: Update and by Eq.(17) and Eq.(18)
Line 9: Update and by Eq.(21) and Eq.(22)
End repeat
Return One-hot labels
Algorithm 1 SLSA-DA

Computational Complexity

We analyze the computational complexity of Algorithm 1 using the notation. We denote as the number of iterations. The computational cost is detailed as follows: for solving the generalized eigen-decomposition problem, i.e. Line 7; for updating , , i.e. Line 8; for constructing the , i.e. Lines 6; for updating , , i.e. Line 9; for updating , , i.e. Line 6; In summary, the overall computational complexity of Algorithm 1 is . Moreover, the value of k is not greater than 200, not greater than 100, so . Therefore, it can be solved in polynomial time with respect to the number of samples.

4 Experiments

4.1 Datasets and Experimental Settings

In order to validate the effectiveness of our approach in both the DA and SLSA-DA scenario, we conducted experiments on 4 benchmark datasets in cross-domain object recognition, i.e., Office10-Caltech10, Office-Home, ImageCLEF-DA, Office31. Fig. 3 illustrates some sample images from Office10-Caltech10 and Office-Home datasets, and they follow very different distributions. Their descriptions are introduced as follows:

Office10-Caltech10 Gong et al. (2012) contains 4 real-world object domains, where 3 domains are come from Office31 dataset (i.e., Amazon (A), Webcam (W) and DSLR (D)), and the last one is come from Caltech256 dataset (Caltech (C)). Then we select 10 shared classes between these 4 domains and construct a DA dataset Office10-Caltech10, which has 2,533 images and DA tasks, e.g., AW, CD and so on. Note that the arrow ”” is the direction from the source domain to target domain. For example, WD means Webcam is the labeled source domain while Dslr is the unlabeled target domain.

Office-Home Venkateswara et al. (2017) was released recently as a more challenging dataset, crawled through several search engines and online image directories. It consists of 4 different domains, Artistic images (Ar), Clipart images (Cl), Product images (Pr) and Real-World images (Rw). In total, there are 15,500 images from 65 object categories, and 12 DA tasks.

ImageCLEF-DA Long et al. (2018)

has 1800 images organized by selecting the 12 common classes shared by 3 public domains, Caltech-256 (C), ImageNet ILSVRC 2012 (I), and Pascal VOC 2012 (P), where 6 DA tasks can be created.

Office31 Saenko et al. (2010) is an increasingly popular benchmark for visual DA, which includes 3 real-world object domains, Amazon (A), Webcam (W) and DSLR (D), and has 4,652 images from 31 categories, then 6 DA tasks can be constructed.

Figure 3: Exemplars from (A) Amazon, (D) Dslr, (W) Webcam, (C) Caltech, (Ar) Art, (Cl) Clipart, (Pr) Product and (Rw) Real-World datasets

4.2 Experimental Results

TCA Pan et al. (2011) 46.5 35.3 44.6 38.6 38.3 38.9 30.2 28.7 90.4 33.3 33.4 88.8 45.6
JDA Long et al. (2013) 43.1 35.9 47.8 34.9 42.0 36.3 31.0 40.4 88.5 29.2 29.7 89.5 45.7
BDA Wang et al. (2017) 45.4 39.7 43.9 38.6 44.7 40.8 30.1 35.0 89.8 29.9 35.9 89.8 47.0
VDA Tahmoresnezhad and Hashemi (2017) 50.8 44.7 49.0 37.5 44.7 43.9 30.9 41.3 88.5 30.2 34.1 89.5 48.8
JGSA Zhang et al. (2017) 50.7 47.5 43.9 42.5 48.1 46.5 30.9 39.9 89.8 30.0 39.0 89.2 49.8
MEDA Wang et al. (2018) 55.4 54.6 57.3 44.4 55.3 40.1 34.3 41.8 86.6 33.6 43.1 86.4 52.7
Our 59.2 58.0 55.4 46.4 45.1 47.1 36.5 32.0 93.6 38.3 44.8 89.5 53.8
Table 1: Accuracy (%) on the Office10-Caltech10 dataset with SURF features in DA setting
JAN Long et al. (2017) 76.8 88.0 94.7 89.5 74.2 91.7 85.4 97.4 99.8 84.7 68.6 70.0 85.1
CDAN  Long et al. (2018) 76.7 90.6 97.0 90.5 74.5 93.5 93.1 98.2 100.0 89.8 70.1 68.0 86.8
CAN Zhang et al. (2018) 78.2 87.5 94.2 89.5 75.8 89.2 81.5 98.2 99.7 85.5 65.9 63.4 84.1
MADA Pei et al. (2018) 75.0 87.9 96.0 88.8 75.2 92.2 90.1 97.4 99.6 87.8 70.3 66.4 85.6
TCA Pan et al. (2011) 77.7 81.2 92.7 87.5 74.2 84.8 76.1 97.6 99.4 79.7 64.2 63.8 81.6
JDA Long et al. (2013) 77.0 81.3 95.2 91.2 76.8 84.3 83.3 98.0 99.8 81.7 68.2 69.0 83.8
BDA Wang et al. (2017) 76.0 79.7 94.8 91.5 76.2 82.2 80.8 96.4 99.6 79.9 67.6 67.2 82.7
VDA Tahmoresnezhad and Hashemi (2017) 77.3 83.3 94.3 91.5 77.0 87.2 84.3 98.6 100.0 82.5 68.7 69.8 84.5
JGSA Zhang et al. (2017) 77.0 83.5 95.5 91.7 77.3 88.8 86.7 97.9 99.8 83.9 69.6 71.3 85.3
MEDA Wang et al. (2018) 79.5 92.2 95.7 92.3 78.7 95.5 86.2 97.7 99.6 86.1 72.6 74.7 87.6
Our 79.2 90.0 94.8 91.5 78.3 93.8 86.0 98.6 99.8 88.4 74.5 71.9 87.2
Table 2: Accuracy (%) on the ImageCLEFF-DA and Office31 datasets with ResNet50 features in DA setting
DA ArCl ArPr ArRw ClAr ClPr ClRw PrAr PrCl PrRw RwAr RwCl RwPr Avg.
JAN Long et al. (2017) 45.9 61.2 68.9 50.4 59.7 61.0 45.8 43.4 70.3 63.9 52.4 76.8 58.3
CDAN Long et al. (2018) 50.7 70.6 76.0 57.6 70.0 70.0 57.4 50.9 77.3 70.9 56.7 81.6 65.8
MDD Zhang et al. (2019) 54.9 73.7 77.8 60.0 71.4 71.8 61.2 53.6 78.1 72.5 60.2 82.3 68.1
TADA Wang et al. (2019b) 53.1 72.3 77.2 59.1 71.2 72.1 59.7 53.1 78.4 72.4 60.0 82.9 67.6
BSP Chen et al. (2019) 52.0 68.6 76.1 58.0 70.3 70.2 58.6 50.2 77.6 72.2 59.3 81.9 66.3
TAT Liu et al. (2019) 51.6 69.5 75.4 59.4 69.5 68.6 59.5 50.5 76.8 70.9 56.6 81.6 65.8
TCA Pan et al. (2011) 48.7 65.3 70.1 49.2 59.7 63.2 52.0 45.0 71.9 63.7 51.4 77.1 59.8
JDA Long et al. (2013) 50.9 67.7 70.9 51.3 64.4 64.9 54.6 47.7 73.3 64.9 53.7 78.3 61.9
BDA Wang et al. (2017) 47.8 59.3 67.7 49.0 62.0 61.4 50.1 46.0 70.7 61.8 51.5 74.5 58.5
VDA Tahmoresnezhad and Hashemi (2017) 51.2 69.3 72.2 53.6 66.1 66.9 56.0 48.8 74.5 65.8 54.1 79.5 63.2
JGSA Zhang et al. (2017) 51.4 69.2 72.6 51.8 67.3 67.0 55.9 48.7 75.6 64.4 53.3 78.5 63.0
MEDA Wang et al. (2018) 55.3 75.7 77.6 57.2 73.9 72.0 58.6 52.3 78.7 68.3 57.0 81.9 67.4
Our 58.1 77.4 78.7 61.6 72.5 72.5 62.5 54.4 79.1 70.1 59.6 82.6 69.1
Table 3: Accuracy (%) on the Office-Home dataset with ResNet50 features in DA setting

The proposed approach involves 4 parameters: projected clustering regularizer , projected scaling regularizer , subspace dimensions and iterations . For the parameters, we fix =5, =0.01, and the 20-nearest neighbor graph is adopted with Euclidean distance-based weight for simplicity. Specially, we set =20, =0.05 on the Office10-Caltech10 and ImageCLEFF-DA datasets, while =100, =0.1 on the Office-Home and Office31 datasets since they contain more categories. In the coming section, we will provide empirical analysis on parameter sensitivity, which verifies that a stable performance could be achieved under a wide range of values.

We adopted different types of features as the inputs, either the traditional shallow features or deep features. Specifically, the shallow SURF features 

Gong et al. (2012) with 800 dimensions are adopted in Office10-Caltech10. As for Office-Home, ImageCLEF-DA and Office31, we utilize the deep features pre-extracted from the ResNet50 model and pre-trained on ImageNet He et al. (2016), and the feature dimensionality is 2048. In order to construct an SLSA-DA scenario, we randomly choose 5 source instances from each class as labeled samples and others are unlabeled, then the random selection is repeated ten times and average results are adopted.

Since no previous approaches have been proposed to tackle SLSA-DA problem, we first compare the proposed approach with several state-of-art methods in DA setting, where the labels for all of source data instances are available. Specifically, in DA setting, we compare the proposed approach with both the shallow DA methods (TCA Pan et al. (2011), JDA Long et al. (2013), BDA Wang et al. (2017), VDA Tahmoresnezhad and Hashemi (2017), JGSA Zhang et al. (2017), MEDA Wang et al. (2018)) and the deep DA methods (JAN Long et al. (2017), CAN Zhang et al. (2018), MADA Pei et al. (2018), CDAN Long et al. (2018), MDD Zhang et al. (2019), TADA Wang et al. (2019b), BSP Chen et al. (2019), TAT Liu et al. (2019)). The performances of different methods in DA settings are shown in Table 1, Table 2, Table 3,To be specific, Table 1 illustrates that the results of our approach are substantially higher than all other 6 ones on most DA tasks (8/12), and the average accuracy is 53.8%, which has 1.1% improvement compared with the best baseline MEDA. From Table 2, it could be seen that best results are achieved only 2/12 DA tasks but most of them are very close to the highest ones. Besides, the average accuracy of our approach is only 0.2% lower than the best baseline MEDA. From Table 3, it can be observed that our approach is also able to attain best performances on the most DA tasks (8/12), and increase the average accuracy by 1.0% compared with the best baseline MDD (68.1% to 69.1%). Therefore, the competitive capability of our approach in DA setting could be validated compared with those state-of-art DA methods, either the shallow or deep ones.

In order to further embody the superiority of our approach concerning the SLSA-DA scenario, we also test the behavior of other mainstream approaches in the SLSA-DA problem. As a note, the deep DA methods integrate feature extraction and knowledge transfer into an end-to-end network and achieve promising results, and this paper adopts a two-stage mechanism to promote the transferability of deep ResNet50 features. Some recent techniques have been proven that more effective knowledge transfer, is easier and faster to be implemented with this two-stage mechanism. Moreover, the promising results of deep DA methods mainly depend on feeding adequate labeled data, it may well fail to train a classifier since the labels are very limited in SLSA-DA scenario. Therefore, in SLSA-DA scenario, we only report the results compared with those two-stage methods (i.e., TCA 

Pan et al. (2011), JDA Long et al. (2013), BDA Wang et al. (2017), VDA Tahmoresnezhad and Hashemi (2017), JGSA Zhang et al. (2017), MEDA Wang et al. (2018)).

TCA(s) Pan et al. (2011) 39.53.1 36.52.3 38.12.9 53.12.6 54.72.6 54.03.1 79.42.4 75.84.3 80.54.0 79.51.7 75.72.7 81.83.9 62.43.0
JDA(s) Long et al. (2013) 37.53.0 33.82.3 35.12.8 47.13.6 47.12.9 47.92.2 68.94.4 67.34.4 76.94.1 73.43.1 70.12.8 81.72.9 57.23.2
BDA(s) Wang et al. (2017) 39.72.3 35.92.9 38.02.6 49.42.5 52.52.6 51.42.7 73.83.6 71.54.0 79.74.3 74.32.5 72.21.7 81.72.2 60.02.8
VDA(s) Tahmoresnezhad and Hashemi (2017) 39.12.7 34.52.5 36.13.1 47.83.6 49.22.3 49.72.7 68.23.7 65.14.3 78.63.3 72.93.2 68.24.0 80.93.2 57.53.2
JGSA(s) Zhang et al. (2017) 41.22.6 33.12.1 32.82.3 50.43.1 45.73.6 43.64.8 71.03.3 66.54.5 79.72.1 75.03.4 72.72.2 80.34.4 57.73.2
MEDA(s) Wang et al. (2018) 39.53.8 35.22.7 35.52.4 53.63.1 53.03.0 50.44.1 77.12.9 77.33.0 78.13.3 77.42.8 76.83.5 78.34.4 61.03.3
Our(s) 45.12.5 40.93.1 42.41.8 58.43.4 60.32.5 59.41.9 80.03.5 79.42.6 85.13.4 77.33.0 76.52.2 86.13.0 65.92.7

TCA(t) Pan et al. (2011)
33.93.5 27.86.1 33.54.1 31.51.5 30.22.7 30.92.3 27.92.6 28.01.3 73.45.8 31.41.4 32.61.9 79.62.8 38.43.0
JDA(t) Long et al. (2013) 33.43.7 28.05.7 31.54.4 28.83.0 32.12.2 31.23.4 28.23.0 28.36.4 69.46.5 29.82.6 31.72.6 79.32.7 37.63.9
BDA(t) Wang et al. (2017) 34.43.4 30.74.9 34.55.5 31.42.0 33.94.6 33.52.9 31.02.1