I Introduction
Statistical machine learning models rely heavily on the assumption that the data used for training and test are drawn from the same or similar distribution, i.e. independent identical distribution (
). However, in real world, it is impossible to guarantee that assumption. Hence, in visual recognition tasks, classifier or model usually does not work well because of data bias between the distributions of the training and test data
[1],[2][3],[4],[5],[6],[7]. The domain discrepancy constitutes a major obstacle in training the predictive models across domains. For example, an object recognition model trained on labeled images may not generalize well on the testing images under various variations in the pose, occlusion, or illumination. In Machine Learning this problem is labeled as domain mismatch. Failing to model such a distribution shift may cause significant performance degradation. Also, the models trained with only a limited number of labeled patterns are usually not robust for pattern recognition tasks. Furthermore, manual labeling of sufficient training data for diverse application domains may be prohibitive. However, by leveraging the labeled data drawn from another sufficiently labeled source domain that describes related contents with target domain, establishing an effective model is possible. Therefore, the challenging objective is how to achieve knowledge transfer across domains such that the distribution mismatch is reduced. Underlying techniques for addressing this challenge, such as domain adaptation
[8][9], which aims to learn domaininvariant models across source and target domain, has been investigated.Domain adaptation (DA)[10],[11],[12] as one kind of transfer learning (TL) perspective, addresses the problem that data is from two related but different domains[13],[14]. Domain adaptation establishes knowledge transfer from the labeled source domain to the unlabeled target domain by exploring domaininvariant structures that bridge different domains with substantial distribution discrepancy. In terms of the accessibility of target data labels in transfer learning, domain adaptation methods can be divided into three categories: supervised[15],[16], semisupervised[17],[18],[5] and unsupervised[19],[20],[21].
In this paper, we focus on unsupervised transfer learning where the target data labels are unavailable in transfer model learning phase. Unsupervised setting is more challenging due to the common data scarcity problem. In unsupervised transfer learning[22], Maximum Mean Discrepancy (MMD)[23] is widely used and has achieved promising performance. MMD, that aims at minimizing the domain distribution discrepancy, is generally exploited to reduce the difference of conditional distributions and marginal distributions across domains by utilizing the unlabeled domain data in a Reproducing Kernel Hilbert Space (RKHS). Also, in the framework of deep transfer learning[24]
, MMDbased adaptation layers are further integrated in deep neural networks to improve the transferable capability between the source and domains
[25].MMD actually acts as a discrepancy metric or criterion to evaluate the distribution mismatch across domains and works well in aligning the global distribution. However, it only considers the domain discrepancy and generally ignores the intrinsic data structure of target domain, e.g. local structure just as Fig.1
(b). It is known that geometric structure is indispensable for domain distance minimization, which, thus, can well exploit the internal local structure of target data. Particularly, in unsupervised learning, the local structure of target data often plays a more important role than the global structure. This is originated from manifold assumption that the data with local similarity is with similar labels. Motivated by manifold assumption, a novel manifold criterion (MC) is proposed in our work, which is similar but very different from conventional manifold algorithms that the MC actually acts as a generative transfer criterion for unsupervised domain adaptation.
Intuitively, we hold the assumption that if a new target domain can be automatically generated by using the source domain data, the domain transfer issue can be naturally addressed. To this end, a criterion that measures the generative effect can be explored. In this paper, considering the locality property of target data, we wish that the generative target data should hold similar local structure with the true target domain data. Naturally, motivated by manifold assumption[26], an objective generative transfer metric, manifold criterion (MC), is proposed. Suppose that two samples and in target domain are close to each other, and if the generative target sample by using the source data is also close to , we recognize that the generated intermediate domain data shares similar distribution with the target domain. This is the basic idea of our generative transfer learning in this paper.
But how to construct the generative target domain? From the perspective of manifold learning, we expect that the new target data is generated by using a locality structure preservation metric. This idea can be interpreted under the commonly investigated case of independent identically distribution () that the affinity structure in highdimensional space can still be preserved in some projected lowdimensional subspace (i.e. manifold structure embedding). In general, the internal intrinsic structure can remain unchanged by using graph Laplacian regularization[27], which reflects the affinity of the raw data.
Specifically, with the proposed manifold criterion, a anifold riterion guided ransfer earning (MCTL) is proposed, which aims to pursue a latent common subspace via a projection matrix for source and target domain. In the common subspace, a generative transfer matrix is solved by leveraging the source domain data and the MC generative metric, for a new generative data that holds similar marginal distribution with source data in a unsupervised manner. The findings and analysis show that the proposed manifold criterion can be used to reduce the local domain discrepancy.
Additionally, in MCTL model, the embedding of lowrank constraint (LRC) on the transfer matrix ensures that the data from source domains can be well interpreted during generation, which can show an approximated blockdiagonal property. With the LRC exploited, the local structure based MC can be guaranteed as we wish without distortion[28].
The idea of our MCTL is described in Fig.2. In summary, the main contribution and novelty of this work are fourfold:

We propose a unsupervised manifold criterion generative transfer learning (MCTL) method, which aims to generate a new intermediate target domain that holds similar distribution with true target data by leveraging source data as basis. The proposed manifold criterion (MC) is modeled by a novel local generative discrepancy metric (LGDM) for local crossdomain discrepancy measure, such that the local transfer can be effectively aligned.

In order to keep the global distribution consistency, a global generative discrepancy metric (GGDM), that offers a linear method to compare the highorder statistics of two distributions, is proposed to minimize the discrepancy between the generative target data and the true target data. Therefore, the local and global affinity structures across domains are simultaneously guaranteed.

For improving the correlation between the source data and the generative target data, LRC regularization on the transfer matrix is integrated in MCTL, such that the blockdiagonal property can be utilized for preventing the domain transfer from distortion and negative transfer.

Under the MCTL framework, for a more generic case, a simplified version of MCTL (i.e. MCTLS) method is proposed, which constrains that the generative data should be seriously consistent with the target domain in a simple yet generic manner. Interestingly, with this constraint, the LGDM loss in MCTLS is naturally degenerated into a generic manifold regularization.
The remainder of this paper is organized as follows. In Section II, we review the related work in transfer learning. In Section III, we present the preliminary idea of the proposed manifold criterion. In Section IV, the proposed MCTL method and optimization are formulated. In Section V, the simplified version of MCTL is introduced and preliminarily analyzed. In Section VI, the classification method is described. In Section VII, the experiments in crossdomain visual recognition are presented. The discussion is presented in Section VIII. Finally, the paper is concluded in Section IX.
Ii Related Work
Iia Shallow Transfer Learning
A lot of transfer learning methods are proposed to tackle heterogeneous domain adaptation problems. Generally, these methods can be divided into three categories in the follows.
Classifier based approaches. A generic way is to directly learn a common classifier on auxiliary domain data by leveraging a few labeled target data. Yang et al.[29] proposed an adaptive SVM (ASVM) to learn a new target classifier by supposing that , where the classifier is trained with the labeled source samples and is the perturbation function. Bruzzone et al.[30] developed an approach to iteratively learn the SVM classifier by labeling the unlabeled target samples and simultaneously removing some labeled samples in the source domain. Duan et al.[8] proposed an adaptive multiple kernel learning (AMKL) for consumer video event recognition from annotated web videos. Also, a domain transfer MKL (DTMKL)[5], which learn a SVM classifier and a kernel function simultaneously for classifier adaptation. Zhang et al.[31] proposed a robust classifier transfer method (EDA) which was modelled based on ELM and manifold regularization for visual recognition.
Feature augmentation/transformation based approaches. Li et al.[32] proposed a heterogeneous feature augmentation (HFA)which tends to learn a transformed feature space for domain adaptation. Kulis et al.[9] proposed an asymmetric regularized crossdomain transform (ARCt) method for learning a transformation metric. In [33], Hoffman et al. proposed a MaxMargin Domain Transforms (MMDT) which a category specific transformation was optimized for domain transfer. Gong et al. proposed a Geodesic Flow Kernel (GFK)[34] method which integrates an infinite number of linear subspaces on the geodesic path to learn the domaininvariant feature representation. Gopalan et al.[35] proposed an unsupervised method (SGF) for low dimensional subspace transfer in which a group of subspaces along the geodesic between source and target data is sampled, and the source data is projected into the subspaces for discriminative classifier learning. An unsupervised feature transformation approach, Transfer Component Analysis (TCA)[11], was proposed to discover common features having the same marginal distribution by using Maximum Mean Discrepancy (MMD) as nonparametric discrepancy metric. MMD[23],[36],[37] is often used in transfer learning. Long et al.[38] proposed a Transfer Sparse Coding (TSC) approach to construct robust sparse representations by using empirical MMD as the distance measure. The Transfer Joint Matching (TJM) proposed by Long et al.[19]
tends to learn a nonlinear transformation by minimizing the MMD based distribution discrepancy.
Feature representation based approaches. Different from those methods above, domain adaptation is achieved by representing across domain features. Jhuo et al.[39] proposed a RDALR method, in which the source data is reconstructed with target domain by using lowrank modeling. Similarly, Shao et al. [40] proposed a LTSL method by prelearning a subspace using PCA or LDA, then lowrank representation across domain is modeled. Zhang et al. [41],[42] proposed Latent Sparse Domain Transfer (LSDT) and Discriminative Kernel Transfer Learning (DKTL) methods for visual adaptation, by jointly learning a subspace projection and sparse reconstruction across domain. Further, Xu et al. [43] proposed a DTSL method, which combines the lowrank and sparse constraint on the reconstruction matrix.
In this paper, the proposed method is different from the existing shallow transfer learning methods that a generative transfer idea is motivated, which tends to achieve domain adaptation by generating an intermediate domain that has similar distribution with the true target domain.
IiB Deep Transfer Learning
Deep learning, as a datadriven transfer learning method, has witnessed a great achievements in many fields[44],[45],[46],[47]. However, when solving domain data problems by deep learning technology, massive labeled training data are required. For the smallsize tasks, deep learning may not work well. Therefore, deep transfer learning methods have been studied.
Donahue et al.[48]
proposed a deep transfer method for smallscale object recognition, and the convolutional network (AlexNet) was trained on ImageNet. Similarly, Razavian et al.
[49] also proposed to train a network based on ImageNet for highlevel feature extractor. Tzeng et al.[44] proposed a DDC method which simultaneously achieves knowledge transfer between domains and tasks by using CNN. Long et al.[25] proposed a deep adaptation network (DAN) method by imposing MMD loss on the highlevel features across domains. Additionally, Long et al.[21] also proposed a residual transfer network (RTN) which tends to learn a residual classifier based on softmax loss. Oquab et al.[46] proposed a CNN architecture for middle level feature transfer, which is trained on large annotated image set. Additionally, Hu et al.[24] proposed a nonCNN based deep transfer metric learning (DTML) method to learn a set of hierarchical nonlinear transformations for achieving crossdomain visual recognition.Recently, GAN inspired adversarial domain adaptation has been preliminarily studied. Tzeng et al. proposed a novel ADDA method [50] for adversarial domain adaptation, in which CNN is used for adversarial discriminative feature learning, and achieves the stateoftheart performance.
In this work, although the proposed MCTL method is a shallow transfer learning paradigm, the competitive capability comparing to these deep transfer learning methods has been validated on the preextracted deep features.
IiC Differences Between MCTL and Other Reconstruction Transfer Methodologies
The proposed MCTL is partly related by reconstruction transfer methods, such as DTSL[43], LSDT[41] and LTSL[40], but essentially different from them. These methods aim to learn a common subspace where a feature reconstruction matrix between domains is learned for adaptation. Sparse reconstruction and lowrank based constraints were considered, respectively. Different from reconstruction transfer, the proposed MCTL is a generative transfer learning paradigm, which is partly inspired by the idea of GAN[51] and manifold learning. The differences and relations are as follows.
Reconstruction Transfer. As the name implies, a reconstruction matrix is expected for domain correspondence. In LTSL, subspace projection is prelearned by offtheshelf methods such as PCA, LDA, etc. Then projected source data is used to reconstruct the projected target data via lowrank constraint. The subspace may be suboptimal leading to a possible local optimum of . Further, the LSDT method was proposed for realizing domain adaptation by exploiting crossdomain sparse reconstruction in some latent subspace, simultaneously. The DTSL was proposed by posing hybrid regularization of sparsity and lowrank constraints for learning a more robust reconstruction transfer matrix. Reconstruction transfer always expresses target domain by leveraging source domain, however, this expression is not accurate due to the limited number of target domain data in calculating the reconstruction error loss, and the robustness is decreased.
Generative Transfer. The proposed MCTL method introduces a generative transfer learning concept, which aims to realize an intermediate domain generation by constructing a Manifold Criterion loss. The motivation is that the domain adaptation problem can be solved by generating a similar domain that shares the same distribution with the true target domain. The essential differences of our work from reconstruction lie in that: (1) Domain adaptation is recognized to be a domain generation problem, instead of a domain alignment problem. (2) The manifold criterion loss is well constructed for generation, instead of the leastsquare based reconstruction error loss. In addition, the GGDM based global domain discrepancy loss and LRC regularization are also integrated in MCTL for global distribution discrepancy reduction and domain correlation enhancement, simultaneously.
Similarity and Relationship. The reconstruction transfer and generative transfer are similar and related in three aspects. (1) Both aim at pursuing a more similar domain with the target data by leveraging the source domain data. (2) Both are unsupervised transfer learning, which do not need the data label information in domain adaptation. (3) Both have similar model formulation and solvers for obtaining the domain correspondence matrix and transformation.
Iii Manifold Criterion Preliminary
as a typical unsupervised learning method has been widely used. Manifold hypothesis means that an intrinsic geometric lowdimensional structure is embedded in highdimensional feature space and the data with affinity structure own similar labels. This demonstrates that manifold hypothesis works but under the data of independent identically distribution (
). Therefore, we could have a try to build a manifold criterion to measure the condition (i.e. domain discrepancy minimization) and guide the transfer learning across domains through an intermediate domain.In this paper, manifold hypothesis is used in the process of generating domain as shown in Fig.2
. Essentially different from manifold learning and regularization, we propose a novel manifold criterion (MC) that is utilized as generative discrepancy metric. In semisupervised learning (SSL), manifold regularization is often used but under
condition. However, transfer learning is different from SSL that domain data does not satisfy condition. In this paper, it should be figure out that if the intermediate domain can be generated via the manifold criterion guided objective function, then the distribution of the generated intermediate domain and the true target domain is recognized to be matched.The idea of manifold criterion is described in Fig.2. We observe that a projection matrix is first learned for some common subspace projection, and then a generative transfer matrix is learned for intrinsic structure preservation and distribution discrepancy minimization between the true target data and generative target data by source domain data. That is, if the generative data has similar affinity structure with the true target domain, i.e. manifold criterion is satisfied, we can have a conclusion that the generative data shares similar distribution with target domain. Notably, different from reconstruction based domain adaptation methods, in this work, we tend to generate an intermediate domain by leveraging source domain, i.e. generative transfer instead of reconstruction transfer.
Moreover, we show Fig.1 to imply that MC (local) and MMD (global) can be jointly considered in transfer learning models. Frankly, the idea of this paper is intuitive, simple and easy to follow. The key point lies in that how to generate the intermediate domain data such that the generated data complies with manifold assumption originated from the true target domain data. If the manifold criterion is satisfied (i.e. is achieved), then domain adaptation or distribution alignment is completed, which is the principle of MCTL.
Iv MCTL: Manifold Criterion Guided Transfer Learning
Iva Notations
In this paper, source and target domain are defined by subscript and . Training set of source and target domain are defined as and . denotes generative target domain, where denotes an implicit but generic transformation, denotes dimensionality, and denote the number of samples in source and target domain, respectively. Let , then , where . Let be the basis transformation that maps raw data space from to a latent subspace . represents generative transfer matrix,
denotes identity matrix,
and denote norm and norm, respectively. The superscript denotes transpose operator and denotes matrix trace operator.In RKHS, the kernel Gram matrix is defined as , where is a kernel function. In the following sections, let , and , and it is easy to get that , and .
IvB Problem Formulation
In this section, the proposed MCTL method is presented in Fig.2, in which the same distribution between the enerated intermediate arget domain () and the true arget domain () under common subspace is what we expected. That is, the intermediate target domain is generated to share the approximated distribution as the true target domain by exploiting the proposed Manifold Criterion as domain discrepancy metric. Specifically, two generative discrepancy metrics (LGDM vs. GGDM) for measuring the domain discrepancy locally and globally are proposed. Overall, the model is composed of three items. The item is MCbased LGDM loss which is used to measure the local domain discrepancy with the manifold criterion by exploiting the locality of target data. The item is the GGDM loss which is applied to minimize the global domain discrepancy of marginal distributions between the generated intermediate target domain and the true target domain. The item is the LRC regularization (lowrank constraint) which is carried out to keep the generalization of . A detailed MCTL method is described in the follows.
IvB1 MC based Local Generative Discrepancy Metric
The MC based local generative discrepancy metric (LGDM) loss is used to enhance the distribution consistency between source and target domain indirectly, by constraining the generative target data with manifold criterion. For convenience, is defined as a sample in and is defined as a sample in . We claim that the distribution consistency between and is achieved, i.e. domain transfer is done, only if two sets satisfy the following manifold criterion, which can be formulated as
(1)  
where
is the affinity matrix described as
and represents the nearest neighbors of sample . The matrix is a diagonal matrix with entries , . As claimed before, , the projected source data and target data can be expressed as and . By substituting and the Gram matrix after projection (i.e. and ) into Eq. (1), the MC based LGDM loss can be further formulated as(2)  
From Eq.(2), the motivation is clearly demonstrated which tends to achieve local structure consistency (i.e. manifold consistency) between the generative target data and the true target data. The intrinsic difference between Eq.(2) and the manifold embedding or regularization is that we aim to produce the assumption with a manifold criterion, while the conventional manifold learning relies on this assumption.
IvB2 Global Generative Discrepancy Metric Loss
In order to reduce the distribution mismatch between the generative target data and the true target data, a generic MMD for global generative discrepancy metric (GGDM) is proposed by minimizing the discrepancy as follows.
(3) 
where and denote the distribution of generated target domain and true target domain, respectively. However, model may not transfer knowledge directly and it is unclear where a test sample is from ( source or target domain ) if there is not a common subspace. We consider to find a latent common subspace for source and target domain by using a projection matrix . Therefore, by projecting and to the subspace, the GGDM loss after projection can be formulated as follows. Considering that , by substituting it in the equation, there is
(4)  
where
represents a full one column vector.
The projection matrix is a linear transformation, which can be represented as some linear combination of the training data, i.e. , where denotes the linear combination coefficient matrix. Then the projected source data can be expressed as and the projected target data can be expressed as . With the kernel trick, the inner product of implicit transformation is represented as Gram matrix, from raw space to RKHS. As described in section 4.1, let and , the source domain and target domain can be expressed simply as and , respectively. Therefore, the GGDM loss is formulated as
(5) 
IvB3 LRC for Domain Correlation Enhancement
In domain transfer, the loss functions are designed for interpreting the generative target data and the true target data. Significantly, the generative target data plays an critical role in the proposed model. In this work, a general transfer matrix
is used to bridge the source domain data and the generative data (intermediate result). It is known that for structural consistency between different domains is our goal, therefore, it is natural to consider the lowrank structure of as a choice for enhancing the domain correlation. In our MCTL, lowrank constraint (LRC), that is effective in showing the global structure of different domain data, is finally used. The LRC regularization ensures that the data from different domains can be well interlaced during domain generation, which is significant to reduce the disparity of domain distributions. Furthermore, if the projected data lies in the same manifold, each sample in target domain can be represented by its neighbors in source domain. This requires that the generative transfer matrix is approximately blockwise. Therefore, LRC regularization is necessary. Considering the nonconvexity property of rank function which is NPhard, the nuclear norm is used as a rank approximation in this work.IvB4 Completed Model of MCTL
By reviewing the MC based LGDM loss in Eq.(2), the GGDM loss in Eq.(5), and the LRC regularization, the objective function of our MCTL method is finally formulated as follows.
(6)  
where and are the tradeoff parameters. The rows of are required to be orthogonal and normalized to unit norm for preventing trivial solutions by enforcing , which can be further rewritten as , an equality constraint. Obviously, the model is nonconvex with respect to two variables, but can be solved with the variable alternating strategy, and the optimization algorithm is formulated.
IvC Optimization
There are two variables and in the MCTL model (6), therefore an efficient variable alternating optimization strategy is naturally considered, i.e. one variable is solved while frozen the other one. First, when is fixed, a general Eigenvalue decomposition is used for solving . Second, when is fixed, the inexact augmented Lagrangian multiplier (IALM) and gradient descent are used to solve . In the following, the optimization details of the proposed method are presented.
By introducing an auxiliary variable , the problem (6) can be written as follows. Furthermore, with the augmented Lagrange function[52], the model can be written as
(7)  
where 1 represents a full one matrix instead of a full one vector as the problem (6) is unfolded. denotes the Lagmultiplier and is a penalty parameter.
In the following, we present how to optimize the three variables , , and in the problem (7) based on Eigenvalue decomposition, IALM and gradient descent in stepwise.
IvC1 Update
By frozen and , can be solved as
(8)  
We can derive the solution of the iteration in columnwise. To obtain the column vector in , by setting the partial derivative of problem (8) with respect to to be zero, there is
(9)  
It is clear that can be obtained by solving an Eigendecomposition problem, and is the eigenvector corresponding to the
smallest eigenvalue.
IvC2 Update
By frozen and , the problem is solved with respect to . After dropping out the irrelevant terms with respect to , in iteration can be solved as
(10)  
It can be further rewritten as
(11) 
Problem (11
) can be efficiently solved using the singular value thresholding (SVT) operator
[53], which contains two major steps. First, singular value decomposition (SVD) is conducted on matrix
, and get , where , is the singular value with rank . Second, the optimal solution is then obtained by thresholding the singular values as , where , and denotes the positive value operator.IvC3 Update
By frozen and , the problem is solved with respect to . By dropping out those terms independent of in (7), there is
(12)  
We can see from problem (12) that it is hard to obtain a closedform solution of . Therefore, the general gradient descent operator[54] is used, and the solution of in the iteration is presented as
(13) 
where denotes the gradient, which is calculated as
(14)  
In detail, the iterative optimization procedure of the proposed MCTL is summarized in Algorithm 1.
Algorithm 1 The Proposed MCTL 
Input: , , , 
Procedure: 
1. Compute , , 
, 
2.Initialize: == 
3. While not converge do 
3.1 Step1: Fix and , and update by solving 
eigenvalue decomposition problem (9). 
3.2 Step2: Fix , and update using IALM: 
3.2.1. Fix and update by using the singular value 
thresholding (SVT) [53] operator on problem (11). 
3.2.2. Fix and update according to gradient 
descent operator, i.e. Equation (13). 
3.3 Update the multiplier : 
3.4 Update the parameter : 
3.5 Check convergence 
end while 
Output: and . 
V MCTLS: Simplified Version of MCTL
As illustrated in MCTL, which aims to minimize the distribution discrepancy between the generative target data and the true target data as close as possible, by using the manifold criterion. In this section, considering the generic manifold embedding, for model simplicity, we rewrite a simplified version of MCTL (MCTLS in short) as illustrated in Fig.3.
Va Formulation of MCTLS
With the description of Fig.3 (right), suppose an extreme case of domain generation, that is, the generated target data is strictly the same as the true target data, i.e. ( coincides with ), then MCTLS is formulated as,
(15)  
where is the conventional Laplacian matrix. Also, the objective function (15) contains three items such as the MC based LGDM loss, the GGDM loss and LRC regularization. From the MCS loss term in Equation (15), we observe a generic manifold regularization term with Laplacian matrix. Therefore, the MC loss can be degenerated into a conventional manifold constraint by implying , which shows that MCTLS model is harsher than MCTL model.
VB Optimization of MCTLS
MCTLS has a similar mechanism with MCTL, therefore, the MCTLS optimization is almost the same as MCTL. With two updating steps for and , the optimization procedure of the MCTLS method is illustrated as follows.
Update . In the MCTLS model, by frozen and , the derivative of the objective function (15) w.r.t. is set as zero, there is
(16)  
Therefore, in iteration can be obtained by solving an Eigenvalue decomposition problem, and is the eigenvector corresponding to the smallest eigenvalue.
Update . The variable can be effectively solved by the singular value thresholding (SVT) operator[53], which is similar to the problem (11).
Update . The variable can be updated according to section 4.3.3 by using gradient descent algorithm. The gradient with respect to can be expressed as
(17)  
Vi Classification
For classification, the projected source data and target data can be represented as , . Then, existing classifiers (e.g., SVM, least square method[55], SRC[56]) can be trained on the domain aligned and augmented training data with label by following the experimental setting as LSDT[41]. Notably, for the COIL20, MSRC and VOC2007 experiments, in order to follow the same experimental setting with DTSL[43], the classifier is trained only on with label . Finally, classification on those unlabeled target test data, i.e. , is achieved, and the recognition accuracy is reported and compared.
Vii Experiments
In this section, the experiments on several benchmark datasets[57] have been exploited for evaluating the proposed MCTL method, including (1) crossdomain object recognition[58],[59]: 4DA office data, 4DACNN office data, COIL20 data, and MSRCVOC 2007 datasets [38]
; (2) crosspose face recognition: MultiPIE face dataset; (3) crossdomain handwritten digit recognition: USPS, SEMEION and MNIST datasets. Several related transfer learning methods based on feature transformation and reconstruction, such as SGF
[35], GFK[34], SA[60], LTSL[40], DTSL[43], and LSDT[41] have been compared and discussed.Viia Crossdomain Object Recognition
For crossdomain object/image recognition, 5 benchmark datasets are used, where several sample images in 4DA office dataset are shown in Fig. 4, several sample images in COIL20 object dataset are shown in Fig. 6, several sample images in MSRC and VOC 2007 datasets are described in Fig. 7.
Results on 4DA Office dataset (Amazon, DSLR, Webcam^{1}^{1}1http://www.eecs.berkeley.edu/~mfritz/domainadaptation/ and Caltech 256^{2}^{2}2http://www.vision.caltech.edu/Image_Datasets/Caltech256/)[34]:
Four domains such as Amazon (A), DSLR (D), Webcam (W), and Caltech (C) are included in 4DA dataset, which contains 10 object classes. In our experiment, the configuration is followed in[34] where 20 samples per class are selected from Amazon, 8 samples per class from DSLR, Webcam and Caltech when they are used as source domains; 3 samples per class are chosen when they are used as target training data, while the rest data in target domains are used for testing. Note that the 800bin SURF features [34],[61] are extracted.
4DA Tasks 
Naive Comb  HFA[15]  ARCt[9]  MMDT[33]  SGF[35]  GFK[34]  SA[60] 


LSDT[41]  

4DACNN Tasks(f7)  SourceOnly  Naive Comb  SGF[35]  TCA  GFK[34]  LTSL[40]  LSDT[41]  

Tasks  SVM  TSL  RDALR[62]  DTSL[43]  LTSL[40]  LSDT[41]  

The recognition accuracies are reported in Table I, from which we observe that the propose MCTL ranks the second () in average but slightly inferior to LTSLLDA (). The reason may be that the discrimination of LDA helps improve the performance, because LTSLPCA only achieves , and our MCTL also outperforms other methods. Notably, the 4DA task is a challenging benchmark, which attracts many competitive approaches for evaluation and comparison. Therefore, excellent baselines have been achieved.
In 4DACNN dataset, the CNN features are extracted by feeding the raw 4DA data (10 object classes) into the well trained convolutional neural network (AlexNet with 5 convolutional layers and 3 fully connected layers) on ImageNet
[63]. The features from the and layers (i.e. DeCAF [48]) are explored. The feature dimensionality is 4096. In experiments, a standard configuration and protocol is used by following [34]. In this paper, the features of the layer are experimented. The recognition accuracies by using the layer outputs for 12 crossdomain tasks are shown in Table II, from which we can observe that the average recognition accuracy of the proposed method shows the best performance. The superiority of generative transfer learning is demonstrated. We can see that our MCTL outperforms LTSLLDA, this may be because there has been a better discrimination of CNN features, and discriminative learning may not significantly work.The compared methods in Table II are shallow transfer learning. It is interesting to compare with deep transfer learning methods, such as AlexNet[63], DDC[44], DAN[25] and RTN[21]. The comparison is described in Fig.5, from which we can observe that our proposed method ranks the second in average performance (), which is inferior to the residual transfer network (RTN), but still better than other three deep transfer learning models. The comparison shows that the proposed MCTL, as a shallow transfer learning method, has a good competitiveness.
Comments
There are no comments yet.