Manifold Criterion Guided Transfer Learning via Intermediate Domain Generation

03/25/2019 ∙ by Lei Zhang, et al. ∙ Chongqing University Nanjing University Harbin Institute of Technology Nanyang Technological University The Chinese University of Hong Kong, Shenzhen 22

In many practical transfer learning scenarios, the feature distribution is different across the source and target domains (i.e. non-i.i.d.). Maximum mean discrepancy (MMD), as a domain discrepancy metric, has achieved promising performance in unsupervised domain adaptation (DA). We argue that MMD-based DA methods ignore the data locality structure, which, to some extent, would cause the negative transfer effect. The locality plays an important role in minimizing the nonlinear local domain discrepancy underlying the marginal distributions. For better exploiting the domain locality, a novel local generative discrepancy metric (LGDM) based intermediate domain generation learning called Manifold Criterion guided Transfer Learning (MCTL) is proposed in this paper. The merits of the proposed MCTL are four-fold: 1) the concept of manifold criterion (MC) is first proposed as a measure validating the distribution matching across domains, and domain adaptation is achieved if the MC is satisfied; 2) the proposed MC can well guide the generation of the intermediate domain sharing similar distribution with the target domain, by minimizing the local domain discrepancy; 3) a global generative discrepancy metric (GGDM) is presented, such that both the global and local discrepancy can be effectively and positively reduced; 4) a simplified version of MCTL called MCTL-S is presented under a perfect domain generation assumption for more generic learning scenario. Experiments on a number of benchmark visual transfer tasks demonstrate the superiority of the proposed manifold criterion guided generative transfer method, by comparing with other state-of-the-art methods. The source code is available in https://github.com/wangshanshanCQU/MCTL.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 8

page 9

page 10

page 13

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Statistical machine learning models rely heavily on the assumption that the data used for training and test are drawn from the same or similar distribution, i.e. independent identical distribution (

). However, in real world, it is impossible to guarantee that assumption. Hence, in visual recognition tasks, classifier or model usually does not work well because of data bias between the distributions of the training and test data

[1],[2][3],[4],[5],[6],[7]

. The domain discrepancy constitutes a major obstacle in training the predictive models across domains. For example, an object recognition model trained on labeled images may not generalize well on the testing images under various variations in the pose, occlusion, or illumination. In Machine Learning this problem is labeled as domain mismatch. Failing to model such a distribution shift may cause significant performance degradation. Also, the models trained with only a limited number of labeled patterns are usually not robust for pattern recognition tasks. Furthermore, manual labeling of sufficient training data for diverse application domains may be prohibitive. However, by leveraging the labeled data drawn from another sufficiently labeled source domain that describes related contents with target domain, establishing an effective model is possible. Therefore, the challenging objective is how to achieve knowledge transfer across domains such that the distribution mismatch is reduced. Underlying techniques for addressing this challenge, such as domain adaptation

[8][9], which aims to learn domain-invariant models across source and target domain, has been investigated.

Domain adaptation (DA)[10],[11],[12] as one kind of transfer learning (TL) perspective, addresses the problem that data is from two related but different domains[13],[14]. Domain adaptation establishes knowledge transfer from the labeled source domain to the unlabeled target domain by exploring domain-invariant structures that bridge different domains with substantial distribution discrepancy. In terms of the accessibility of target data labels in transfer learning, domain adaptation methods can be divided into three categories: supervised[15],[16], semi-supervised[17],[18],[5] and unsupervised[19],[20],[21].

In this paper, we focus on unsupervised transfer learning where the target data labels are unavailable in transfer model learning phase. Unsupervised setting is more challenging due to the common data scarcity problem. In unsupervised transfer learning[22], Maximum Mean Discrepancy (MMD)[23] is widely used and has achieved promising performance. MMD, that aims at minimizing the domain distribution discrepancy, is generally exploited to reduce the difference of conditional distributions and marginal distributions across domains by utilizing the unlabeled domain data in a Reproducing Kernel Hilbert Space (RKHS). Also, in the framework of deep transfer learning[24]

, MMD-based adaptation layers are further integrated in deep neural networks to improve the transferable capability between the source and domains

[25].

MMD actually acts as a discrepancy metric or criterion to evaluate the distribution mismatch across domains and works well in aligning the global distribution. However, it only considers the domain discrepancy and generally ignores the intrinsic data structure of target domain, e.g. local structure just as Fig.1

(b). It is known that geometric structure is indispensable for domain distance minimization, which, thus, can well exploit the internal local structure of target data. Particularly, in unsupervised learning, the local structure of target data often plays a more important role than the global structure. This is originated from manifold assumption that the data with local similarity is with similar labels. Motivated by manifold assumption, a novel manifold criterion (MC) is proposed in our work, which is similar but very different from conventional manifold algorithms that the MC actually acts as a generative transfer criterion for unsupervised domain adaptation.

Intuitively, we hold the assumption that if a new target domain can be automatically generated by using the source domain data, the domain transfer issue can be naturally addressed. To this end, a criterion that measures the generative effect can be explored. In this paper, considering the locality property of target data, we wish that the generative target data should hold similar local structure with the true target domain data. Naturally, motivated by manifold assumption[26], an objective generative transfer metric, manifold criterion (MC), is proposed. Suppose that two samples and in target domain are close to each other, and if the generative target sample by using the source data is also close to , we recognize that the generated intermediate domain data shares similar distribution with the target domain. This is the basic idea of our generative transfer learning in this paper.

But how to construct the generative target domain? From the perspective of manifold learning, we expect that the new target data is generated by using a locality structure preservation metric. This idea can be interpreted under the commonly investigated case of independent identically distribution () that the affinity structure in high-dimensional space can still be preserved in some projected low-dimensional subspace (i.e. manifold structure embedding). In general, the internal intrinsic structure can remain unchanged by using graph Laplacian regularization[27], which reflects the affinity of the raw data.

Specifically, with the proposed manifold criterion, a anifold riterion guided ransfer earning (MCTL) is proposed, which aims to pursue a latent common subspace via a projection matrix for source and target domain. In the common subspace, a generative transfer matrix is solved by leveraging the source domain data and the MC generative metric, for a new generative data that holds similar marginal distribution with source data in a unsupervised manner. The findings and analysis show that the proposed manifold criterion can be used to reduce the local domain discrepancy.

Additionally, in MCTL model, the embedding of low-rank constraint (LRC) on the transfer matrix ensures that the data from source domains can be well interpreted during generation, which can show an approximated block-diagonal property. With the LRC exploited, the local structure based MC can be guaranteed as we wish without distortion[28].

Fig. 1: Motivation of MCTL. The lines represent the classification boundary of source domain. The centroid represents the geometric center of all data points.

The idea of our MCTL is described in Fig.2. In summary, the main contribution and novelty of this work are fourfold:

  • We propose a unsupervised manifold criterion generative transfer learning (MCTL) method, which aims to generate a new intermediate target domain that holds similar distribution with true target data by leveraging source data as basis. The proposed manifold criterion (MC) is modeled by a novel local generative discrepancy metric (LGDM) for local cross-domain discrepancy measure, such that the local transfer can be effectively aligned.

  • In order to keep the global distribution consistency, a global generative discrepancy metric (GGDM), that offers a linear method to compare the high-order statistics of two distributions, is proposed to minimize the discrepancy between the generative target data and the true target data. Therefore, the local and global affinity structures across domains are simultaneously guaranteed.

  • For improving the correlation between the source data and the generative target data, LRC regularization on the transfer matrix is integrated in MCTL, such that the block-diagonal property can be utilized for preventing the domain transfer from distortion and negative transfer.

  • Under the MCTL framework, for a more generic case, a simplified version of MCTL (i.e. MCTL-S) method is proposed, which constrains that the generative data should be seriously consistent with the target domain in a simple yet generic manner. Interestingly, with this constraint, the LGDM loss in MCTL-S is naturally degenerated into a generic manifold regularization.

The remainder of this paper is organized as follows. In Section II, we review the related work in transfer learning. In Section III, we present the preliminary idea of the proposed manifold criterion. In Section IV, the proposed MCTL method and optimization are formulated. In Section V, the simplified version of MCTL is introduced and preliminarily analyzed. In Section VI, the classification method is described. In Section VII, the experiments in cross-domain visual recognition are presented. The discussion is presented in Section VIII. Finally, the paper is concluded in Section IX.

Ii Related Work

Ii-a Shallow Transfer Learning

A lot of transfer learning methods are proposed to tackle heterogeneous domain adaptation problems. Generally, these methods can be divided into three categories in the follows.

Classifier based approaches. A generic way is to directly learn a common classifier on auxiliary domain data by leveraging a few labeled target data. Yang et al.[29] proposed an adaptive SVM (A-SVM) to learn a new target classifier by supposing that , where the classifier is trained with the labeled source samples and is the perturbation function. Bruzzone et al.[30] developed an approach to iteratively learn the SVM classifier by labeling the unlabeled target samples and simultaneously removing some labeled samples in the source domain. Duan et al.[8] proposed an adaptive multiple kernel learning (AMKL) for consumer video event recognition from annotated web videos. Also, a domain transfer MKL (DTMKL)[5], which learn a SVM classifier and a kernel function simultaneously for classifier adaptation. Zhang et al.[31] proposed a robust classifier transfer method (EDA) which was modelled based on ELM and manifold regularization for visual recognition.

Feature augmentation/transformation based approaches. Li et al.[32] proposed a heterogeneous feature augmentation (HFA)which tends to learn a transformed feature space for domain adaptation. Kulis et al.[9] proposed an asymmetric regularized cross-domain transform (ARC-t) method for learning a transformation metric. In [33], Hoffman et al. proposed a Max-Margin Domain Transforms (MMDT) which a category specific transformation was optimized for domain transfer. Gong et al. proposed a Geodesic Flow Kernel (GFK)[34] method which integrates an infinite number of linear subspaces on the geodesic path to learn the domain-invariant feature representation. Gopalan et al.[35] proposed an unsupervised method (SGF) for low dimensional subspace transfer in which a group of subspaces along the geodesic between source and target data is sampled, and the source data is projected into the subspaces for discriminative classifier learning. An unsupervised feature transformation approach, Transfer Component Analysis (TCA)[11], was proposed to discover common features having the same marginal distribution by using Maximum Mean Discrepancy (MMD) as non-parametric discrepancy metric. MMD[23],[36],[37] is often used in transfer learning. Long et al.[38] proposed a Transfer Sparse Coding (TSC) approach to construct robust sparse representations by using empirical MMD as the distance measure. The Transfer Joint Matching (TJM) proposed by Long et al.[19]

tends to learn a non-linear transformation by minimizing the MMD based distribution discrepancy.

Feature representation based approaches. Different from those methods above, domain adaptation is achieved by representing across domain features. Jhuo et al.[39] proposed a RDALR method, in which the source data is reconstructed with target domain by using low-rank modeling. Similarly, Shao et al. [40] proposed a LTSL method by pre-learning a subspace using PCA or LDA, then low-rank representation across domain is modeled. Zhang et al. [41],[42] proposed Latent Sparse Domain Transfer (LSDT) and Discriminative Kernel Transfer Learning (DKTL) methods for visual adaptation, by jointly learning a subspace projection and sparse reconstruction across domain. Further, Xu et al. [43] proposed a DTSL method, which combines the low-rank and sparse constraint on the reconstruction matrix.

In this paper, the proposed method is different from the existing shallow transfer learning methods that a generative transfer idea is motivated, which tends to achieve domain adaptation by generating an intermediate domain that has similar distribution with the true target domain.

Ii-B Deep Transfer Learning

Deep learning, as a data-driven transfer learning method, has witnessed a great achievements in many fields[44],[45],[46],[47]. However, when solving domain data problems by deep learning technology, massive labeled training data are required. For the small-size tasks, deep learning may not work well. Therefore, deep transfer learning methods have been studied.

Donahue et al.[48]

proposed a deep transfer method for small-scale object recognition, and the convolutional network (AlexNet) was trained on ImageNet. Similarly, Razavian et al.

[49] also proposed to train a network based on ImageNet for high-level feature extractor. Tzeng et al.[44] proposed a DDC method which simultaneously achieves knowledge transfer between domains and tasks by using CNN. Long et al.[25] proposed a deep adaptation network (DAN) method by imposing MMD loss on the high-level features across domains. Additionally, Long et al.[21] also proposed a residual transfer network (RTN) which tends to learn a residual classifier based on softmax loss. Oquab et al.[46] proposed a CNN architecture for middle level feature transfer, which is trained on large annotated image set. Additionally, Hu et al.[24] proposed a non-CNN based deep transfer metric learning (DTML) method to learn a set of hierarchical nonlinear transformations for achieving cross-domain visual recognition.

Recently, GAN inspired adversarial domain adaptation has been preliminarily studied. Tzeng et al. proposed a novel ADDA method [50] for adversarial domain adaptation, in which CNN is used for adversarial discriminative feature learning, and achieves the state-of-the-art performance.

In this work, although the proposed MCTL method is a shallow transfer learning paradigm, the competitive capability comparing to these deep transfer learning methods has been validated on the pre-extracted deep features.

Ii-C Differences Between MCTL and Other Reconstruction Transfer Methodologies

The proposed MCTL is partly related by reconstruction transfer methods, such as DTSL[43], LSDT[41] and LTSL[40], but essentially different from them. These methods aim to learn a common subspace where a feature reconstruction matrix between domains is learned for adaptation. Sparse reconstruction and low-rank based constraints were considered, respectively. Different from reconstruction transfer, the proposed MCTL is a generative transfer learning paradigm, which is partly inspired by the idea of GAN[51] and manifold learning. The differences and relations are as follows.

Reconstruction Transfer. As the name implies, a reconstruction matrix is expected for domain correspondence. In LTSL, subspace projection is pre-learned by off-the-shelf methods such as PCA, LDA, etc. Then projected source data is used to reconstruct the projected target data via low-rank constraint. The subspace may be suboptimal leading to a possible local optimum of . Further, the LSDT method was proposed for realizing domain adaptation by exploiting cross-domain sparse reconstruction in some latent subspace, simultaneously. The DTSL was proposed by posing hybrid regularization of sparsity and low-rank constraints for learning a more robust reconstruction transfer matrix. Reconstruction transfer always expresses target domain by leveraging source domain, however, this expression is not accurate due to the limited number of target domain data in calculating the reconstruction error loss, and the robustness is decreased.

Generative Transfer. The proposed MCTL method introduces a generative transfer learning concept, which aims to realize an intermediate domain generation by constructing a Manifold Criterion loss. The motivation is that the domain adaptation problem can be solved by generating a similar domain that shares the same distribution with the true target domain. The essential differences of our work from reconstruction lie in that: (1) Domain adaptation is recognized to be a domain generation problem, instead of a domain alignment problem. (2) The manifold criterion loss is well constructed for generation, instead of the least-square based reconstruction error loss. In addition, the GGDM based global domain discrepancy loss and LRC regularization are also integrated in MCTL for global distribution discrepancy reduction and domain correlation enhancement, simultaneously.

Similarity and Relationship. The reconstruction transfer and generative transfer are similar and related in three aspects. (1) Both aim at pursuing a more similar domain with the target data by leveraging the source domain data. (2) Both are unsupervised transfer learning, which do not need the data label information in domain adaptation. (3) Both have similar model formulation and solvers for obtaining the domain correspondence matrix and transformation.

Fig. 2: Illustration of the proposed Manifold Criterion Guided Transfer Learning (MCTL). (a) represents the source domain which is used to generate an intermediate target domain shown as (b), that is similar to the true target domain shown in (c). The intermediate domain generation is carried out by the learned generative matrix based on the manifold criterion (MC) in an unsupervised manner. MC interprets the distribution discrepancy, which implies that if the local discrepancy is minimized, the distribution consistency is then achieved. Further, a projection matrix is learned for domain feature embedding. Notably, the is used as the implicit mapping function of data, which can be kernelized in implementation with inner product.

Iii Manifold Criterion Preliminary

Manifold learning[20],[27]

as a typical unsupervised learning method has been widely used. Manifold hypothesis means that an intrinsic geometric low-dimensional structure is embedded in high-dimensional feature space and the data with affinity structure own similar labels. This demonstrates that manifold hypothesis works but under the data of independent identically distribution (

). Therefore, we could have a try to build a manifold criterion to measure the condition (i.e. domain discrepancy minimization) and guide the transfer learning across domains through an intermediate domain.

In this paper, manifold hypothesis is used in the process of generating domain as shown in Fig.2

. Essentially different from manifold learning and regularization, we propose a novel manifold criterion (MC) that is utilized as generative discrepancy metric. In semi-supervised learning (SSL), manifold regularization is often used but under

condition. However, transfer learning is different from SSL that domain data does not satisfy condition. In this paper, it should be figure out that if the intermediate domain can be generated via the manifold criterion guided objective function, then the distribution of the generated intermediate domain and the true target domain is recognized to be matched.

The idea of manifold criterion is described in Fig.2. We observe that a projection matrix is first learned for some common subspace projection, and then a generative transfer matrix is learned for intrinsic structure preservation and distribution discrepancy minimization between the true target data and generative target data by source domain data. That is, if the generative data has similar affinity structure with the true target domain, i.e. manifold criterion is satisfied, we can have a conclusion that the generative data shares similar distribution with target domain. Notably, different from reconstruction based domain adaptation methods, in this work, we tend to generate an intermediate domain by leveraging source domain, i.e. generative transfer instead of reconstruction transfer.

Moreover, we show Fig.1 to imply that MC (local) and MMD (global) can be jointly considered in transfer learning models. Frankly, the idea of this paper is intuitive, simple and easy to follow. The key point lies in that how to generate the intermediate domain data such that the generated data complies with manifold assumption originated from the true target domain data. If the manifold criterion is satisfied (i.e. is achieved), then domain adaptation or distribution alignment is completed, which is the principle of MCTL.

Iv MCTL: Manifold Criterion Guided Transfer Learning

Iv-a Notations

In this paper, source and target domain are defined by subscript and . Training set of source and target domain are defined as and . denotes generative target domain, where denotes an implicit but generic transformation, denotes dimensionality, and denote the number of samples in source and target domain, respectively. Let , then , where . Let be the basis transformation that maps raw data space from to a latent subspace . represents generative transfer matrix,

denotes identity matrix,

and denote -norm and -norm, respectively. The superscript denotes transpose operator and denotes matrix trace operator.

In RKHS, the kernel Gram matrix is defined as , where is a kernel function. In the following sections, let , and , and it is easy to get that , and .

Iv-B Problem Formulation

In this section, the proposed MCTL method is presented in Fig.2, in which the same distribution between the enerated intermediate arget domain () and the true arget domain () under common subspace is what we expected. That is, the intermediate target domain is generated to share the approximated distribution as the true target domain by exploiting the proposed Manifold Criterion as domain discrepancy metric. Specifically, two generative discrepancy metrics (LGDM vs. GGDM) for measuring the domain discrepancy locally and globally are proposed. Overall, the model is composed of three items. The item is MC-based LGDM loss which is used to measure the local domain discrepancy with the manifold criterion by exploiting the locality of target data. The item is the GGDM loss which is applied to minimize the global domain discrepancy of marginal distributions between the generated intermediate target domain and the true target domain. The item is the LRC regularization (low-rank constraint) which is carried out to keep the generalization of . A detailed MCTL method is described in the follows.

Iv-B1 MC based Local Generative Discrepancy Metric

The MC based local generative discrepancy metric (LGDM) loss is used to enhance the distribution consistency between source and target domain indirectly, by constraining the generative target data with manifold criterion. For convenience, is defined as a sample in and is defined as a sample in . We claim that the distribution consistency between and is achieved, i.e. domain transfer is done, only if two sets satisfy the following manifold criterion, which can be formulated as

(1)

where

is the affinity matrix described as

and represents the nearest neighbors of sample . The matrix is a diagonal matrix with entries , . As claimed before, , the projected source data and target data can be expressed as and . By substituting and the Gram matrix after projection (i.e. and ) into Eq. (1), the MC based LGDM loss can be further formulated as

(2)

From Eq.(2), the motivation is clearly demonstrated which tends to achieve local structure consistency (i.e. manifold consistency) between the generative target data and the true target data. The intrinsic difference between Eq.(2) and the manifold embedding or regularization is that we aim to produce the assumption with a manifold criterion, while the conventional manifold learning relies on this assumption.

Iv-B2 Global Generative Discrepancy Metric Loss

In order to reduce the distribution mismatch between the generative target data and the true target data, a generic MMD for global generative discrepancy metric (GGDM) is proposed by minimizing the discrepancy as follows.

(3)

where and denote the distribution of generated target domain and true target domain, respectively. However, model may not transfer knowledge directly and it is unclear where a test sample is from ( source or target domain ) if there is not a common subspace. We consider to find a latent common subspace for source and target domain by using a projection matrix . Therefore, by projecting and to the subspace, the GGDM loss after projection can be formulated as follows. Considering that , by substituting it in the equation, there is

(4)

where

represents a full one column vector.

The projection matrix is a linear transformation, which can be represented as some linear combination of the training data, i.e. , where denotes the linear combination coefficient matrix. Then the projected source data can be expressed as and the projected target data can be expressed as . With the kernel trick, the inner product of implicit transformation is represented as Gram matrix, from raw space to RKHS. As described in section 4.1, let and , the source domain and target domain can be expressed simply as and , respectively. Therefore, the GGDM loss is formulated as

(5)

Iv-B3 LRC for Domain Correlation Enhancement

In domain transfer, the loss functions are designed for interpreting the generative target data and the true target data. Significantly, the generative target data plays an critical role in the proposed model. In this work, a general transfer matrix

is used to bridge the source domain data and the generative data (intermediate result). It is known that for structural consistency between different domains is our goal, therefore, it is natural to consider the low-rank structure of as a choice for enhancing the domain correlation. In our MCTL, low-rank constraint (LRC), that is effective in showing the global structure of different domain data, is finally used. The LRC regularization ensures that the data from different domains can be well interlaced during domain generation, which is significant to reduce the disparity of domain distributions. Furthermore, if the projected data lies in the same manifold, each sample in target domain can be represented by its neighbors in source domain. This requires that the generative transfer matrix is approximately block-wise. Therefore, LRC regularization is necessary. Considering the non-convexity property of rank function which is NP-hard, the nuclear norm is used as a rank approximation in this work.

Iv-B4 Completed Model of MCTL

By reviewing the MC based LGDM loss in Eq.(2), the GGDM loss in Eq.(5), and the LRC regularization, the objective function of our MCTL method is finally formulated as follows.

(6)

where and are the trade-off parameters. The rows of are required to be orthogonal and normalized to unit norm for preventing trivial solutions by enforcing , which can be further rewritten as , an equality constraint. Obviously, the model is non-convex with respect to two variables, but can be solved with the variable alternating strategy, and the optimization algorithm is formulated.

Iv-C Optimization

There are two variables and in the MCTL model (6), therefore an efficient variable alternating optimization strategy is naturally considered, i.e. one variable is solved while frozen the other one. First, when is fixed, a general Eigen-value decomposition is used for solving . Second, when is fixed, the inexact augmented Lagrangian multiplier (IALM) and gradient descent are used to solve . In the following, the optimization details of the proposed method are presented.

By introducing an auxiliary variable , the problem (6) can be written as follows. Furthermore, with the augmented Lagrange function[52], the model can be written as

(7)

where 1 represents a full one matrix instead of a full one vector as the problem (6) is unfolded. denotes the Lag-multiplier and is a penalty parameter.

In the following, we present how to optimize the three variables , , and in the problem (7) based on Eigen-value decomposition, IALM and gradient descent in step-wise.

Iv-C1 Update

By frozen and , can be solved as

(8)

We can derive the solution of the iteration in column-wise. To obtain the column vector in , by setting the partial derivative of problem (8) with respect to to be zero, there is

(9)

It is clear that can be obtained by solving an Eigen-decomposition problem, and is the eigenvector corresponding to the

smallest eigenvalue.

Iv-C2 Update

By frozen and , the problem is solved with respect to . After dropping out the irrelevant terms with respect to , in iteration can be solved as

(10)

It can be further rewritten as

(11)

Problem (11

) can be efficiently solved using the singular value thresholding (SVT) operator

[53]

, which contains two major steps. First, singular value decomposition (SVD) is conducted on matrix

, and get , where , is the singular value with rank . Second, the optimal solution is then obtained by thresholding the singular values as , where , and denotes the positive value operator.

Iv-C3 Update

By frozen and , the problem is solved with respect to . By dropping out those terms independent of in (7), there is

(12)

We can see from problem (12) that it is hard to obtain a closed-form solution of . Therefore, the general gradient descent operator[54] is used, and the solution of in the iteration is presented as

(13)

where denotes the gradient, which is calculated as

(14)

In detail, the iterative optimization procedure of the proposed MCTL is summarized in Algorithm 1.

Algorithm 1 The Proposed MCTL
Input: , , ,
Procedure:
1. Compute , ,
             ,
2.Initialize: ==
3. While not converge do
     3.1 Step1: Fix and , and update by solving
         eigenvalue decomposition problem (9).
     3.2 Step2: Fix , and update using IALM:
          3.2.1. Fix and update by using the singular value
          thresholding (SVT) [53] operator on problem (11).
          3.2.2. Fix and update according to gradient
          descent operator, i.e. Equation (13).
     3.3 Update the multiplier :
             
     3.4 Update the parameter :
             
     3.5 Check convergence
end while
Output: and .

V MCTL-S: Simplified Version of MCTL

As illustrated in MCTL, which aims to minimize the distribution discrepancy between the generative target data and the true target data as close as possible, by using the manifold criterion. In this section, considering the generic manifold embedding, for model simplicity, we rewrite a simplified version of MCTL (MCTL-S in short) as illustrated in Fig.3.

V-a Formulation of MCTL-S

With the description of Fig.3 (right), suppose an extreme case of domain generation, that is, the generated target data is strictly the same as the true target data, i.e. ( coincides with ), then MCTL-S is formulated as,

(15)

where is the conventional Laplacian matrix. Also, the objective function (15) contains three items such as the MC based LGDM loss, the GGDM loss and LRC regularization. From the MC-S loss term in Equation (15), we observe a generic manifold regularization term with Laplacian matrix. Therefore, the MC loss can be degenerated into a conventional manifold constraint by implying , which shows that MCTL-S model is harsher than MCTL model.

The following experimental results in Table VIII and IX also prove that both the harsh MCTL-S model and the MCTL can achieve good performance. This demonstrates that manifold criterion based intermediate domain generation is a very effective scheme for transfer learning.

Fig. 3: Difference between MCTL (left) and MCTL-S (right). In MCTL, there is error between the true target domain and the generative target domain . In MCTL-S, the is supposed to be coincided with the true target domain .

V-B Optimization of MCTL-S

MCTL-S has a similar mechanism with MCTL, therefore, the MCTL-S optimization is almost the same as MCTL. With two updating steps for and , the optimization procedure of the MCTL-S method is illustrated as follows.

Update . In the MCTL-S model, by frozen and , the derivative of the objective function (15) w.r.t. is set as zero, there is

(16)

Therefore, in iteration can be obtained by solving an Eigenvalue decomposition problem, and is the eigenvector corresponding to the smallest eigenvalue.

Update . The variable can be effectively solved by the singular value thresholding (SVT) operator[53], which is similar to the problem (11).

Update . The variable can be updated according to section 4.3.3 by using gradient descent algorithm. The gradient with respect to can be expressed as

(17)

Vi Classification

For classification, the projected source data and target data can be represented as , . Then, existing classifiers (e.g., SVM, least square method[55], SRC[56]) can be trained on the domain aligned and augmented training data with label by following the experimental setting as LSDT[41]. Notably, for the COIL-20, MSRC and VOC2007 experiments, in order to follow the same experimental setting with DTSL[43], the classifier is trained only on with label . Finally, classification on those unlabeled target test data, i.e. , is achieved, and the recognition accuracy is reported and compared.

Caltech 256
Amazon
DSLR
Webcam
Fig. 4: Some images from 4DA datasets

Vii Experiments

In this section, the experiments on several benchmark datasets[57] have been exploited for evaluating the proposed MCTL method, including (1) cross-domain object recognition[58],[59]: 4DA office data, 4DA-CNN office data, COIL-20 data, and MSRC-VOC 2007 datasets [38]

; (2) cross-pose face recognition: Multi-PIE face dataset; (3) cross-domain handwritten digit recognition: USPS, SEMEION and MNIST datasets. Several related transfer learning methods based on feature transformation and reconstruction, such as SGF

[35], GFK[34], SA[60], LTSL[40], DTSL[43], and LSDT[41] have been compared and discussed.

Vii-a Cross-domain Object Recognition

For cross-domain object/image recognition, 5 benchmark datasets are used, where several sample images in 4DA office dataset are shown in Fig. 4, several sample images in COIL-20 object dataset are shown in Fig. 6, several sample images in MSRC and VOC 2007 datasets are described in Fig. 7.

Results on 4DA Office dataset (Amazon, DSLR, Webcam111http://www.eecs.berkeley.edu/~mfritz/domainadaptation/ and Caltech 256222http://www.vision.caltech.edu/Image_Datasets/Caltech256/)[34]:

Four domains such as Amazon (A), DSLR (D), Webcam (W), and Caltech (C) are included in 4DA dataset, which contains 10 object classes. In our experiment, the configuration is followed in[34] where 20 samples per class are selected from Amazon, 8 samples per class from DSLR, Webcam and Caltech when they are used as source domains; 3 samples per class are chosen when they are used as target training data, while the rest data in target domains are used for testing. Note that the 800-bin SURF features [34],[61] are extracted.

Fig. 5: Comparison with deep transfer learning methods
Fig. 6: Some examples from COIL-20 dataset

4DA Tasks
Naive Comb HFA[15] ARC-t[9] MMDT[33] SGF[35] GFK[34] SA[60]
LTSL
-PCA
[40]
LTSL
-LDA
[40]
LSDT[41]
TABLE I: Recognition accuracy () of different domain adaptation in 4DA Setting
4DA-CNN Tasks(f7) SourceOnly Naive Comb SGF[35] TCA GFK[34] LTSL[40] LSDT[41]
TABLE II: Recognition accuracy () of different domain adaptation of the layer in 4DACNN Setting
Tasks SVM TSL RDALR[62] DTSL[43] LTSL[40] LSDT[41]
TABLE III: Recognition accuracy () of different domain adaptation methods on COIL-20
MSRC
VOC 2007
Fig. 7: Some examples from MSRC and VOC 2007 datasets

The recognition accuracies are reported in Table I, from which we observe that the propose MCTL ranks the second () in average but slightly inferior to LTSL-LDA (). The reason may be that the discrimination of LDA helps improve the performance, because LTSL-PCA only achieves , and our MCTL also outperforms other methods. Notably, the 4DA task is a challenging benchmark, which attracts many competitive approaches for evaluation and comparison. Therefore, excellent baselines have been achieved.

Results on 4DA-CNN dataset (Amazon, DSLR, Webcam and Caltech 256)[63],[61]:

In 4DA-CNN dataset, the CNN features are extracted by feeding the raw 4DA data (10 object classes) into the well trained convolutional neural network (AlexNet with 5 convolutional layers and 3 fully connected layers) on ImageNet

[63]. The features from the and layers (i.e. DeCAF [48]) are explored. The feature dimensionality is 4096. In experiments, a standard configuration and protocol is used by following [34]. In this paper, the features of the layer are experimented. The recognition accuracies by using the layer outputs for 12 cross-domain tasks are shown in Table II, from which we can observe that the average recognition accuracy of the proposed method shows the best performance. The superiority of generative transfer learning is demonstrated. We can see that our MCTL outperforms LTSL-LDA, this may be because there has been a better discrimination of CNN features, and discriminative learning may not significantly work.

The compared methods in Table II are shallow transfer learning. It is interesting to compare with deep transfer learning methods, such as AlexNet[63], DDC[44], DAN[25] and RTN[21]. The comparison is described in Fig.5, from which we can observe that our proposed method ranks the second in average performance (), which is inferior to the residual transfer network (RTN), but still better than other three deep transfer learning models. The comparison shows that the proposed MCTL, as a shallow transfer learning method, has a good competitiveness.

Tasks SVM TSL RDALR[62] DTSL[43] LTSL[40] LSDT[41]