I Introduction
Transfer learning has been a hot topic in the machine learning community. It aims to solve the classifier learning problem of a target domain which has limited label information with week supervision information
[1], with the help of a source domain which has sufficient labels. The problem of using two domains for the problem of one domain is that their distributions are significantly different. A lot of works have been proposed to learn from two domains with different distributions for the classification problem in the target domain [2, 3, 4, 5, 6, 7]. However, the performance of these works are not satisfying. The shortages of these works paper are due to the ignorance of the label information of the target domain, the ignorance of the local connection of the data points of both source and target domain, or the ignorance of the the differences of the source domain data points for the learning problem of target domain.In this paper, we propose a novel transfer learning problem to solve this problems. We map the data of two domains to one common space by linear transformation, and match the distribution of the two domains in this commons space. In these common subspaces, we match the distributions by using the weighting factors of the source domain data points. The distribution matching framework is shown in Figure. 1. We propose to minimize the classification errors of the data points of both the source and target domains to use the labels of the target domain. To do this, we learn a classifier in the common space by using the labels of data points of both domains, and then adapt the common classifier to the two domains by adding adaptive functions to the common classifier respectively. The learning framework of source and target domain classifiers is given in Figure. 2. Moreover, we also propose to use local reconstruction information to regularize the learning of the weights of the source domain data points, and the classifier of the target domain. The learning problem is constructed by minimizing the objective function with regard to the parameters of the linear transformation matrix, the common classifier parameter and the adaptation parameters. We design an iterative learning algorithm to solve this problem.
Ii Proposed transfer learning method
Iia Modeling
We suppose the source domain training set is , where
is the feature vector of
dimensions of the th data point, and is its label. The target domain training set is , where is the feature vector of the th data point, and is its label. Only the the first target domain data points are labeled. We map the data of both domains to a common space by a transformation matrix ,(1) 
We present the distribution of the source domains in the common space as the weighted mean of the vectors of the data points,
(2) 
where is the weighting factor of the th data point. We also present the distribution of the target domain as the mean of its data points in the common space,
(3) 
Naturally we hope the distributions of the two domains can be as close to each other as possible. So we propose to minimize the squared norm distance between them with regard to both and ,
(4)  
We design a linear classifier in the common space as in (1),
(5) 
where is the parameter vector of the common classifier . Then we adapt it to two domains by adding adaptive functions to the common classifier, and obtain the source domain classifier , and the target domain classifier ,
(6)  
where is the source domain adaptive function and is its parameter vector of the adaptation function. where is the target domain adaptive function, and is its parameter vector. To measure the classification errors of the two classifiers over the training set, we use the popular hinge loss, and minimize it to learn the parameters,
(7)  
In this classification error minimization problem, we also use the source domain data point weighting factors to weight the classification error terms.
We denote the neighborhood set of the th source data point as , and the reconstruction coefficients of are solved by the following minimization problem,
(8)  
where are the coefficients for reconstruction of from the neighbors in . Then we use them to regularize the learning of the source domain weighting factors,
(9) 
Similarly we also have the neighborhood reconstruction coefficients for the target domain data set, and we use them to regularize the classification responses,
(10) 
The overall minimization problem for the transfer learning framework is the combination of problems of (4), (7), (9), (10), and squared norms of classifier parameter vectors for overfitting problems,
(11)  
In this minimization problem, we impose to be orthogonal, impose a lower bound and a upper bound for , and an additional constraint to , so that the summation of all the elements of is .
IiB Optimizatoin
We rewrite the source domain and target domain classifiers as a linear function of the input feature vectors,
(12)  
Then we have the following minimization problem,
(13)  
To solve this problem, we use the iterative optimization method to update the variables one by one.
IiB1 Solving w and
We first solve w by setting the derivative of objective with regard to w to zero, and we have
(14) 
Then we substitute it to (13), and consider the optimization of , we have
(15)  
This problem can be easily solve by the eigendecomposition method.
IiB2 Updating and
To update both and , we consider the following minimization problem,
(16)  
To solve this problem, we use the subgradient algorithm to update and ,
(17) 
The subgradient functions of with regard to and are
(18)  
IiB3 Updating
To solve , we have the following minimization problem,
(19)  
This problem is a linear constrained quadratic programming problem, and we solve it by using the active set algorithm.
Iii Experiments
Iiia Data Sets
In the experiments, we use three benchmark data sets. Which are the 20Newsgroup corpus data set, the Amazon review data set, and the Spam email data set. 20Newsgroup corpus data set is a data set of newspaper documents. It contains documents of 20 classes. The classes are organized in a hierarchical structure. For a class, it usually have two or more subclasses. For example, in the class of car, there are two subclasses, which are motorcycle and auto. To split this data set to source domain and target domain, for one class, we keep one subclass in the source domain, while put the other subclass to the target domain. We follow the splitting of source and target domain of NG14 data set of [2]. In this data set, there are 6 classes, and for each class, one subclass is in the source domain, and another subclass is in the target domain. For each domain, the number of data points is 2,400. The bagofword features of each document are used as original features. Amazon review data set is a data set of reviews of products. It contains reviews of three types of products, which are books, DVD and Music. The reviews belongs to two classes, which are positive and negative. We treat the review of books as source domain, and that of DVD as target domain. For each domain, we have 2,000 positive reviews and 2,000 reviews. Again, we use the bagofwords features as the features of reviews. Spam email data set is a set of emails of different individuals. In this data set, there are emails of three different individuals’ inboxes, and we treat each individual as a domain. In each individual’s inbox, there are 2,500 emails, and the emails are classified to two different classes, which are normal email and spam email. we also randomly choose one individual as a source domain, and another one as a target domain.
IiiB Results
In the experiments, we use the fold cross validation. For each data set, we use each domain as a target domain in turns, and randomly choose anther domain as a source domain. The classification accuracies of the compared methods over three benchmark data sets are reported in Table I. The proposed method outperforms all the compared methods over three benchmark data sets. In the experiments over the 20Newsgroup data set, the proposed method outperforms the other methods significantly.
Methods  20Newsgroup  Amazon  Spam 

Proposed  0.6210  0.7812  0.8641 
Chen et al. [2]  0.5815  0.7621  0.8514 
Chu et al. [3]  0.5471  0.7642  0.8354 
Ma et al. [4]  0.5164  0.7255  0.8012 
Xiao and Guo [5]  0.5236  0.7462  0.8294 
Li et al. [6]  0.5615  0.7134  0.8122 
Iv Conclusions
In this paper, we proposed a novel transfer learning method. Instead of learning a common representation and classifier directly for both source and target domains, we proposed to learn common space and classifier, and then adapt it to source and target domains. We proposed to weight the source domain data points in the subspaces to match the distributions of the two domains, and to regularize the weighting factors of the source domain data points and the classification responses of the target domain data points by the local reconstruction coefficients.The minimization problem of our method is based on these features, and we solve it by an iterative algorithm. Experiments show its advantages over some other methods. In the future, we will extend the proposed algorithm to various applications, such as computational mechanic [8, 9, 10, 11, 12, 13], multimedia[14, 15, 16, 17, 18, 19, 20, 21, 22], medical imaging [23, 24, 25, 26, 27, 28, 29, 30, 31], bioinformatics [32, 33, 34, 35], material science [36, 37, 38], highperformance computing [39, 40, 41, 42, 43], malicious websites detection [44, 45, 46, 47], biometrics [48, 49, 50, 51]
, etc. We will also consider using some other models to represent and construction the classifier, such as Bayesian network
[52, 53, 54].References
 [1] M. Tan, Z. Hu, B. Wang, J. Zhao, and Y. Wang, “Robust object recognition via weakly supervised metric and template learning,” Neurocomputing, vol. 181, pp. 96–107, 2016.
 [2] B. Chen, W. Lam, I. Tsang, and T.L. Wong, “Discovering lowrank shared concept space for adapting text mining models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1284–1297, 2013.
 [3] W. S. Chu, F. D. L. Torre, and J. F. Cohn, “Selective transfer machine for personalized facial action unit detection,” in Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, 2013, pp. 3515–3522.
 [4] Z. Ma, Y. Yang, N. Sebe, and A. G. Hauptmann, “Knowledge adaptation with partiallyshared features for event detectionusing few exemplars,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 36, no. 9, pp. 1789–1802, 2014.
 [5] M. Xiao and Y. Guo, “Feature space independent semisupervised domain adaptation via kernel matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 1, pp. 54–66, 2015.
 [6] W. Li, L. Duan, D. Xu, and I. Tsang, “Learning with augmented features for supervised and semisupervised heterogeneous domain adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 6, pp. 1134–1148, 2014.

[7]
H. Shao, S. Chen, J.y. Zhao, W.c. Cui, and T.s. Yu, “Face recognition based on subset selection via metric learning on manifold,”
Frontiers of Information Technology & Electronic Engineering, vol. 16, pp. 1046–1058, 2015.  [8] S. Wang, Y. Zhou, J. Tan, J. Xu, J. Yang, and Y. Liu, “Computational modeling of magnetic nanoparticle targeting to stent surface under high gradient field,” Computational mechanics, vol. 53, no. 3, pp. 403–412, 2014.
 [9] Y. Zhou, W. Hu, B. Peng, and Y. Liu, “Biomarker binding on an antibodyfunctionalized biosensor surface: the influence of surface properties, electric field, and coating density,” The Journal of Physical Chemistry C, vol. 118, no. 26, pp. 14 586–14 594, 2014.
 [10] Y. Liu, J. Yang, Y. Zhou, and J. Hu, “Structure design of vascular stents,” Multiscale simulations and mechanics of biological materials, pp. 301–317, 2013.
 [11] B. Peng, Y. Liu, Y. Zhou, L. Yang, G. Zhang, and Y. Liu, “Modeling nanoparticle targeting to a vascular surface in shear flow through diffusive particle dynamics,” Nanoscale research letters, vol. 10, no. 1, pp. 1–9, 2015.
 [12] J. Xu, J. Yang, N. Huang, C. Uhl, Y. Zhou, and Y. Liu, “Mechanical response of cardiovascular stents under vascular dynamic bending,” Biomedical engineering online, vol. 15, no. 1, p. 1, 2016.
 [13] Y. Zhou, S. Sohrabi, J. Tan, and Y. Liu, “Mechanical properties of nanoworm assembled by dna and nanoparticle conjugates,” Journal of Nanoscience and Nanotechnology, vol. 16, no. 6, pp. 5447–5456, 2016.
 [14] R.Z. Liang, W. Xie, W. Li, X. Du, J. J.Y. Wang, and J. Wang, “Semisupervised structured output prediction by local linear regression and subgradient descent,” arXiv preprint arXiv:1606.02279, 2016.

[15]
R.Z. Liang, L. Shi, H. Wang, J. Meng, J. J.Y. Wang, Q. Sun, and Y. Gu, “Optimizing top precision performance measure of contentbased image retrieval by learning similarity function,” in
Pattern Recognition (ICPR), 2016 23st International Conference on. IEEE, 2016.  [16] M. Ding and G. Fan, “Multilayer joint gaitpose manifolds for human gait motion modeling,” IEEE Transactions on Cybernetics, vol. 45, no. 11, pp. 2413–2424, 2015.

[17]
——, “Articulated and generalized gaussian kernel correlation for human pose estimation,”
IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 776–789, 2016. 
[18]
H. Wang and J. Wang, “An effective image representation method using kernel
classification,” in
Tools with Artificial Intelligence (ICTAI), 2014 IEEE 26th International Conference on
. IEEE, 2014, pp. 853–858.  [19] F. Lin, J. Wang, N. Zhang, J. Xiahou, and N. McDonald, “Multikernel learning for multivariate performance measures optimization,” Neural Computing and Applications, pp. 1–13, 2016.

[20]
X. Liu, J. Wang, M. Yin, B. Edwards, and P. Xu, “Supervised learning of sparse context reconstruction coefficients for data representation and classification,”
Neural Computing and Applications, pp. 1–9, 2015.  [21] J. Wang, H. Wang, Y. Zhou, and N. McDonald, “Multiple kernel multivariate performance learning using cutting plane algorithm,” in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on. IEEE, 2015, pp. 1870–1875.
 [22] J. Wang, Y. Zhou, K. Duan, J. J.Y. Wang, and H. Bensmail, “Supervised crossmodal factor analysis for multiple modal data classification,” in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on. IEEE, 2015, pp. 1882–1888.
 [23] D. R. King, W. Li, J. J. Squiers, R. Mohan, E. Sellke, W. Mo, X. Zhang, W. Fan, J. M. DiMaio, and J. E. Thatcher, “Surgical wound debridement sequentially characterized in a porcine burn model with multispectral imaging,” Burns, vol. 41, no. 7, pp. 1478–1487, 2015.

[24]
W. Li, W. Mo, X. Zhang, J. J. Squiers, Y. Lu, E. W. Sellke, W. Fan, J. M. DiMaio, and J. E. Thatcher, “Outlier detection and removal improves accuracy of machine learning approach to multispectral burn diagnostic imaging,”
Journal of biomedical optics, vol. 20, no. 12, pp. 121 305–121 305, 2015.  [25] J. E. Thatcher, W. Li, Y. RodriguezVaqueiro, J. J. Squiers, W. Mo, Y. Lu, K. D. Plant, E. Sellke, D. R. King, W. Fan, et al., “Multispectral and photoplethysmography optical imaging techniques identify important tissue characteristics in an animal model of tangential burn excision,” Journal of Burn Care & Research, vol. 37, no. 1, pp. 38–52, 2016.
 [26] W. Li, W. Mo, X. Zhang, Y. Lu, J. J. Squiers, E. W. Sellke, W. Fan, J. M. DiMaio, and J. E. Thatcher, “Burn injury diagnostic imaging device’s accuracy improved by outlier detection and removal,” in SPIE Defense+ Security. International Society for Optics and Photonics, 2015, pp. 947 206–947 206.
 [27] J. J. Squiers, W. Li, D. R. King, W. Mo, X. Zhang, Y. Lu, E. W. Sellke, W. Fan, J. M. DiMaio, and J. E. Thatcher, “Multispectral imaging burn wound tissue classification system: a comparison of test accuracies of several common machine learning algorithms,” 2016.
 [28] J. S. K. A. J. G. E. F. Y. M. L. Q. L. Chenhui Hu, Lin Cheng, “A spectral graph regression model for learning brain connectivity of alzheimer s disease,” PLOS ONE, 2015.
 [29] C. Hu, J. Sepulcre, K. A. Johnson, G. E. Fakhri, Y. M. Lu, and Q. Li, “Matched signal detection on graphs: Theory and application to brain imaging data classification,” NeuroImage, vol. 125, pp. 587–600, 2016.
 [30] J.J. Chen and W.C. Su, “Integrating iso/iwa1 practices with an automatic patient image quantity examination approach to achieve the patient safety goal in a nuclear medicine department,” in Internet of Things (iThings), 2014 IEEE International Conference on, and Green Computing and Communications (GreenCom), IEEE and Cyber, Physical and Social Computing (CPSCom), IEEE. IEEE, 2014, pp. 332–335.
 [31] W.C. Su, S.C. Yeh, S.H. Lee, and H.C. Huang, “A virtual reality lowerback pain rehabilitation approach: System design and user acceptance analysis,” in International Conference on Universal Access in HumanComputer Interaction. Springer, 2015, pp. 374–382.
 [32] W. Xie, M. Kantarcioglu, W. S. Bush, D. Crawford, J. C. Denny, R. Heatherly, and B. A. Malin, “Securema: protecting participant privacy in genetic association metaanalysis,” Bioinformatics, p. btu561, 2014.

[33]
W. Li, H. Liu, P. Yang, and W. Xie, “Supporting regularized logistic regression privately and efficiently,”
PloS one, vol. 11, no. 6, p. e0156479, 2016.  [34] W. Cai, X. Zhou, and X. Cui, “Optimization of a gpu implementation of multidimensional rf pulse design algorithm,” in Bioinformatics and Biomedical Engineering,(iCBBE) 2011 5th International Conference on. IEEE, 2011, pp. 1–4.
 [35] W. Cai and L. L. Gouveia, “Modeling and simulation of maximum power point tracker in ptolemy,” Journal of Clean Energy Technologies, vol. 1, no. 1, 2013.
 [36] J. Zhuge, L. Zhang, R. Wang, R. Huang, D.W. Kim, D. Park, and Y. Wang, “Random telegraph signal noise in gateallaround silicon nanowire transistors featuring coulombblockade characteristics,” Applied Physics Letters, vol. 94, no. 8, p. 083503, 2009.
 [37] Z. Kang, L. Zhang, R. Wang, and R. Huang, “A comparison study of silicon nanowire transistor with schottkybarrier source/drain and doped source/drain,” in VLSI Technology, Systems, and Applications, 2009. VLSITSA’09. International Symposium on. IEEE, 2009, pp. 133–134.
 [38] L. Zhang, H. Li, Y. Guo, K. Tang, J. Woicik, J. Robertson, and P. C. McIntyre, “Selective passivation of geo2/ge interface defects in atomic layer deposited highk mos structures,” ACS applied materials & interfaces, vol. 7, no. 37, pp. 20 499–20 506, 2015.
 [39] C.Y. Wang, D.Y. Peng, L. Xu, and X.S. Yi, “Gradual graywatermark embedding algorithm in the wavelet domain [j],” Journal of Computer Applications, vol. 6, p. 025, 2007.
 [40] W. Luo, L. Xu, Z. Zhan, Q. Zheng, and S. Xu, “Federated cloud security architecture for secure and agile clouds,” in High Performance Cloud Auditing and Applications. Springer New York, 2014, pp. 169–188.
 [41] L. Meng and J. Johnson, “High performance implementation of the inverse tft,” in Proceedings of the 2015 International Workshop on Parallel Symbolic Computation, ser. PASCO ’15, 2015, pp. 87–94.
 [42] ——, “High performance implementation of the tft,” in Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ser. ISSAC ’14, 2014, pp. 328–334.
 [43] ——, Automatic Parallel Library Generation for GeneralSize Modular FFT Algorithms, 2013, pp. 243–256.
 [44] L. Xu, Z. Zhan, S. Xu, and K. Ye, “An evasion and counterevasion study in malicious websites detection,” in Communications and Network Security (CNS), 2014 IEEE Conference on. IEEE, 2014, pp. 265–273.
 [45] ——, “Crosslayer detection of malicious websites,” in Proceedings of the third ACM conference on Data and application security and privacy. ACM, 2013, pp. 141–152.
 [46] S. Xu, W. Lu, L. Xu, and Z. Zhan, “Adaptive epidemic dynamics in networks: Thresholds and control,” ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 8, no. 4, p. 19, 2014.
 [47] S. Xu, W. Lu, and L. Xu, “Pushand pullbased epidemic spreading in networks: Thresholds and deeper insights,” ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 7, no. 3, p. 32, 2012.
 [48] X. Wang and C. Kambhamettu, “Leveraging appearance and geometry for kinship verification,” in 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014, pp. 5017–5021.
 [49] ——, “A new approach for face recognition under makeup changes,” in 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2015, pp. 423–427.
 [50] X. Wang, V. Ly, G. Guo, and C. Kambhamettu, “A new approach for 2d3d heterogeneous face recognition,” in Multimedia (ISM), 2013 IEEE International Symposium on. IEEE, 2013, pp. 301–304.
 [51] X. Wang and C. Kambhamettu, “Gender classification of depth images based on shape and texture analysis,” in Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE. IEEE, 2013, pp. 1077–1080.
 [52] X. Fan, C. Yuan, and B. Malone, “Tightening bounds for bayesian network structure learning,” in Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI2014), 2014.
 [53] X. Fan, B. Malone, and C. Yuan, “Finding optimal bayesian network structures with constraints learned from data,” in Proceed. of the 30th Conf. on Uncertainty in Artificial Intelligence (UAI2014), 2014.
 [54] X. Fan and C. Yuan, “An improved lower bound for bayesian network structure learning,” in Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI2015), 2015.
Comments
There are no comments yet.