I Introduction
In traditional machine learning and pattern classification methods, there is a strong assumption that all the data are drawn from the same distribution. However, this assumption may not always hold in many real world scenarios. For example, in cases where the training samples are difficult or expensive to obtain, or when the distribution of the samples changes over time, we have to borrow knowledge from another different but highly related domain. Therefore, how to transfer knowledge from another different but related domain has become more and more important. During the past two decades, transfer learning has emerged as a new framework to solve this problem, and has received more and more attention in the machine learning and data mining community. As has been discussed in[1], feature matching based methods are the most widely used transfer learning approaches, which aim to learn a shared feature representation to minimize the distribution discrepancy between the source and target domain [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. Among them, to learn a crossdomain transformations that maps the target features into the source[7, 10, 8, 9, 6, 11] is of great importance. Apart from this, the parameter transfer approach is another highly concerned line of works. It assumes that the transferred knowledge has been encoded into the hyperparameters of the classification model[13, 14, 15, 16]
. Therefore, the source model and target model should share some parameters or prior distribution of the model parameters. Based on this assumption, parameter transfer approaches could adapt the learned source hyperplane to the target domain with a small number of target samples. As illustrated in Figure
1, we show the comparison of the transformbased methods and parameter transfer methods. The transformbased methods map the features to adapt to the learned hyperplane, while the parameter transfer approaches adjust the learned hyperplane to adapt to the shifted features. Despite that a large number of transformbased methods and parameter transfer methods have been proposed to address the knowledge transfer problem, few works have tried to combine these two methods together. In this paper, we attempt to comply with parameter transfer based on the projectivemodel, especially under the framework of the extreme learning machine.As a special feedforward neural networks, the Extreme learning machine (ELM) first proposed by Huang
[17], which determines its input weights randomly, has become a very popular classifier due to its fast learning speed, satisfactory performance and little human intervention[18]. Therefore, since its first appearance, various extensions have been proposed to make the original ELM model more efficient and suitable for specific applications. Chen et al. optimized the input weights of ELM by generalized hebbian learning and intrinsic plasticity learning [20]. Based on the manifold regularization framework, Huang et al. extended ELM to semisupervised and unsupervised learning in
[19]. To handle the imbalanced data problem, Zong et al. extended the traditional ELM to the weighted ELM (WELM) in [21]. Considering the computational cost and spatial requirements, the online version of ELM has also been proposed and studied[22]. Besides, several multilayer ELM frameworks [23, 24] have also been put forward recently to learn deep representation of the original data.In this paper, we mainly focus on the parameter transfer approach based on the ELM algorithm. We would like to learn a highquality ELM classifier using a small number of labeled target domain samples and a large number of source domain samples. To achieve this goal, we assume that the source domain classification hyperplane and the target domain classification hyperplane could be bridged by a projection matrix, i.e. the target domain parameters can be represented as a projection matrix multiplying with the source domain parameters. In this way, the parameter transfer ELM model can be learned by jointly optimizing the ELM model parameters and the projection matrix. Further more, the of the source domain parameters is incorporated into the objective function, which leads to selecting useful features in the source domain during model training. For ease of notation, the proposed parameter transfer ELM is referred to as PTELM.
The contributions of this paper are fourfold: Firstly, we are among the first to exploit projectivebased model for parameter transfer, especially under the framework of the ELM. Secondly, unlike most of existing works which learn the transformations by minimizing the distribution discrepancy or maximizing some kind of similarity metric between the source and target feature space, the proposed PTELM jointly learns the projection matrix and the model parameter by minimizing the classification error directly. Thirdly, the is imposed on the source domain hyperplane. In this respect, the learned source model tends to select informative features for knowledge transfer. Lastly, we demonstrate that the proposed parameter transfer ELM can also be regarded as a special transformbased domain adaptation method.
Ii Related Works
Recently, some researchers have focused their attention on the domain adaptation ELM. Zhang et al. proposed a domain adaptation ELM to address the sensor drift problem in the Enose system[25]. In [5], a unified subspace transfer framework based on ELM was proposed, which learns a subspace that jointly minimizes the mean distribution discrepancy (MMD) and maximum margin criterion (MMC). Uzair et al. [26] proposed a blind domain adaptation ELM with extreme learning machine autoencoder (ELMAE), which does not need target domain samples for training. In[4], Zhang et al. proposed a ELMbased domain adaptation (EDA) for visual knowledge transfer and extended the EDA to multiview learning. In EDA, the manifold regularization was incorporated into the objective function, and the author minimizes the of the hyperplane and prediction error simultaneously. Besides, a parameter transfer approach based transfer learning ELM (TLELM) has also been proposed in [16], which regularizes the difference between the source and target parameters. In addition, Salaken et al.[27] summarized all the available literatures in the filed of ELM based transfer learning methods.
Among the parameter transfer approaches, the majority of related works incorporated the source model information into the target by regularizing the difference of the parameters between the source and the target domain[13, 14, 15, 16]. The representative method is the adaptive SVM (ASVM)[13], which learns from the source domain parameters by directly regularizing the distance between the learned model and the target model. After that, Aytar et al. [14] proposed two new parameter transfer SVM, which extends and relaxes the ASVM. Li et al.[16] proposed transfer learning ELM by introducing the same regularizer as ASVM in the ELM.
Iii Preliminaries
Iiia A Brief Review of ELM
Considering a supervised learning problem where the training set with
samples and the corresponding targets are given as . Here is the ndimensional input data and is its associated onehot labels. The ELM networks learns a decision rule with the following two stages. In the first stage, it randomly generates the input weights and bias , and maps the original data from the input space into the dimensional feature space , where is the number of hidden nodes, , andis the activation function. In this respect, the only free parameter of the ELM is the output weights
. In the second stage, the ELM solves the output weights by minimizing the square loss summation of prediction errors and the norm of the output weights simultaneously, leads to(1) 
where is the prediction error with respect to the th training sample, the first term of the objective function is the regularization term to prevent the network from overfitting. By substituting the constrain into the objective function, the problem (1) can be simplified to such an unconstrain optimization problem:
(2) 
where . The optimal solution of can then be analytically determined by setting the derivatives of with respect to to zero, i.e.
(3) 
Then, the output weights can be effectively solved by
(4) 
where
is the identity matrix and
is the regularization coefficient. With the closedform solution, the ELM model is remarkably efficient and tends to reach a global optimum.IiiB Notations and Definitions
We summarize the frequently used notations and definitions as below.
Notations: For a matrix , let the th row of denoted by . The Frobenius norm of the matrix is defined as
(5) 
The of a matrix, introduced in [28] firstly as rotation invariant which ensures the row sparsity of a matrix, was widely used for feature selection and structured sparsity regularizer[29, 30, 2, 4]. It is defined as
(6) 
Definition 1. Domain. A domain is composed of a feature space and a marginal distribution . and represent the source and target domain respectively, which are sampled from different but related distributions. Generally, and .
Definition 2. Transfer Learning. For the given source domain , and target domain
. Generally, . Data in the target domain are insufficient to learn a highquality classification model. Transfer learning aims to learn a satisfied classifier with the incorporation of the source domain information.
Definition 3. Parameter Transfer. Define and are the model parameters learned from the two domains.
(7) 
where the first item is the loss function and the second item is the parameter regularization.
is the classification model learned from the source domain , and is the classification model learned from the target domain . Based on the assumption that and should share some parameters or prior distribution, parameter transfer learning aims to transfer knowledge from the to improve the target domain classification model.(8) 
The last item tries to incorporate the information of the into the .
Most parameter transfer approaches seek to leverage the target model by the discrepancy between and , i.e. .
This penalty directly regularizing the distance between and is too strict sometimes. When is large enough, it leads to . In order to relax this constraint, in this paper, we propose the projectivemodel based parameter transfer approach to bridge the source and target domain parameters.
Definition 4. ProjectiveModel based Parameter Transfer. Define a projection matrix , the projectivemodel based parameter transfer assumes that the source domain parameters and target domain parameters could be bridged by .
Iv Proposed Method
In this section, we present the proposed projectivemodel based parameter transfer ELM and its learning algorithm.
Iva Problem Formulation
Suppose we have a source domain with labeled samples , and a target domain with labeled samples . Generally, in domain adaptation algorithm, is a very small number and . Denoting and as the source and target ELM model parameters need to be optimized. As discussed above, we tend to bridge the source domain parameters and the target parameters by a projection matrix , i.e. . Our goal is to learn the ELM classification hyperplane and the projection matrix jointly. In this respect, the objective function can be formulated as
(9) 
(10) 
where , and
denote the outputs of the hidden layer, the onehot label vector and the prediction error with respect to the
th samples from the source domain. Similarly, , and denote the output of the hidden layer, the onehot label vector and the prediction error with respect to the th samples from the target domain. As can be seen, there are four terms altogether in the objective function, which are intuitive to understand. The first two terms tend to simultaneously minimize the training error in the source and target domain, and the last two terms are used for preventing the source and target ELM model from overfitting. , , are tradeoff parameters used to balance the contributions of the four terms to the objective function. The merits that distinguish our proposal from other related works are twofold: On the one hand, different from the traditional parameter transfer approach[16], we bridge the source domain and the target domain parameters by a projection matrix. On the other hand, the instead of the Frobenius norm is imposed on the source domain hyperplane as a regularizer. With this penalty, the row sparsity will be obtained. Benefiting from this property, our model tends to select the informative features in the source domain for knowledge transfer.By substituting the constrains into the objective function, the optimization function (9) can be easily reformulated as an equivalent unconstrained optimization problem.
(11) 
Where and denote the hidden layer outputs of the target and the source ELM model, and denote the output weights of the target and source ELM model. and denote the label matrix of the target and source domain samples. Here, denotes the number of hidden nodes in the ELM model, and is the number of classes of the source and target domains.
IvB Learning Algorithm
As can be seen in problem (11), our goal is to jointly learn the output weights of the source ELM model and the projection matrix . Then, the target ELM model parameters can be easily obtained by . However, with two free parameters to be solved, this optimization problem can not be directly solved like the problem (2). Therefore, we adopt the coordinate descent method to alternatively optimize the two free parameters.
(1) Fix and optimize on : In the first step, we fix the projection matrix as , then, the subproblem can be solved by setting the derivative of objective function to zero. Then we have
(12) 
Note that is a nonsmooth function at zero, therefore, we compute its subgradient instead. , where is a diagonal subgradient matrix with the th element as
(13) 
Here, denotes the th row of , set as a very small constant to prevent the dividend to be zero. With the fixed matrix , could be solved according to Eq. (12), as
(14) 
Note that the subgradient matrix is dependent on the unsolved parameters . Thus, we employ an alternate optimization strategy to solve according to Eq. (13) and Eq. (14). In each iteration, only one parameter is updated with the other one fixed. The algorithm is summarized in algorithm 1. It is worth noting that the iterative procedure will be terminated once the number of iterations reaches or the tends to convergence. The convergence of this algorithm can be easily proved similar to [30].
(2) Fix and optimize on : With the fixed , the subproblem can be easily solved by taking the derivative of Eq. (11) with respect to to zero. We get
(15) 
which leads to
(16) 
The overall learning algorithm is summarized in Algorithm 2. With the randomly initialized input parameters, the hidden layer outputs of the source and target ELM model , which are represented as and , could be calculated beforehand. In each iteration, we update with current , then, update with the current calculated . Owning to the closedform solutions in each iteration, the learning algorithm will converge after several iterations.
IvC Relationship to Transformbased Methods
Most existing domain adaptation methods apply knowledge transfer by learning a crossdomain transformations [9, 10, 31, 11], which maps the source domain data into the target by applying . Instead, our proposed PTELM aims to transform the source domain hyperplane into the target by . In fact, the proposed PTELM can also be regarded as the transformbased method. As can be seen in Eq. (11), we implicitly define , and the is set to be zero. Such that the objective function can be reformulated as
(17) 
Similar to the crossdomain transformation approaches, the above rewritten objective function aims to jointly learn a transformation matrix that transforms the target feature into the source, and the classification hyperplane. The differences between our proposal and the other transformbased methods are threefold. On the one hand, our proposed PTELM transforms the target into the source by column transformation, while the majority of transformbased methods align the source and target by applying row transformation on the source data. On the other hand, the PTELM learns the transformation directly based on the prediction error, while other related works take the distribution discrepancy or similarity metric as guidelines. Lastly, the PTELM learns the transformation and a ELM classifier simultaneously, while many of transformbased methods simply learn the transformation, and then utilize other classifiers (e.g. KNN) for classification.
V Experiments
In this section, we evaluate our proposed PTELM method on several challenging realworld datasets. The source code of the PTELM is released online^{1}^{1}1https://github.com/BoyuanJiang/PTELM.
Va Datasets and Setup
Two types of domain adaptation problems are considered: object recognition and text categorization. A summary of the properties of each domain considered in our experiments is provided in Table I.
CaltechOffice dataset. This dataset [8] consists of Office [7] and Caltech256 [32] datasets. It contains images from four different domains: Amazon (product images download form amazon.com), Webcam (lowresolution images taken by a webcam), Dslr (highresolution images taken by a digital SLR camera) and Caltech. 10 common categories are extracted from all four domains with each category consisting of 8 to 151 samples, and 2533 images in total. Several factors (such as image resolution, lighting condition, noise, background and viewpoint) cause the shift of each domain. Figure 2 highlights the differences among these domains with example images from categories of keyboards and headphones. We consider the SURFBoW image features (SURF in short) provided by [8], which encode the images with 800bin histograms with the codebook trained from a subset of Amazon images using SURF descriptors [33]
. These histograms are then normalized to be zero means and unit variance in each dimension.
Multilingual Reuters Collection dataset. This dataset^{2}^{2}2http://ama.liglab.fr/~amini/DataSets/Classification/Multiview/ReutersMutliLingualMultiView.htm [34, 35], which is collected by sampling from the Reuters RCV1 and RCV2 collections, contains feature characteristics of 111,740 documents originally written in five different languages and their translations (i.e., English, French, German, Italian, and Spanish), over a common set of 6 categories (i.e., C15, CCAT, E21, ECAT, GCAT, and M11). Documents belonging to more than one of the 6 categories are assigned the label of their smallest category. Therefore, there are 1230K documents per language, and 1134K documents per category. All documents are represented as a bag of words and then the TFIDF features are extracted.
Baselines We compare the results with the following baselines and competing methods that are well adapted for domain shift scenarios:

SVM:Support vector machine trained on source.

SVM: Support vector machine trained on target.

ELM: Extreme learning machine trained on source.

ELM: Extreme learning machine trained on target.

GFK: Geodesic Flow Kernel [8].

CDLS: CrossDomain Landmark Selection [31].
VB CrossDomain Object Recognition
For our first experiment, we use the CaltechOffice
domain adaptation benchmark dataset to evaluate our method on the real world computer vision adaptation tasks.
VB1 Experiment Setup
Following the setup of [8, 7, 9], the number of selected labeled source samples per class for amazon, webcam, dslr and caltech is 20, 8, 8, and 8, respectively. Instead, when they serve as target domain, 3 labeled target samples are used. We use the same 20 random train/test splits download from the website^{3}^{3}3https://people.eecs.berkeley.edu/~jhoffman/domainadapt/ provided by the authors [9] for fair comparison and report averaged results across them.
For our method, we fix , and . The number of hidden nodes of the ELM networks is set as 500 in all experiments. For other baseline methods, we use the recommended parameters.
VB2 Results
We report the mean and standard deviation of classification accuracies for all methods on the
OfficeCaltech dataset in Table II. Each result in the same column is based on the same 20 random trials. As can be seen, our proposed method outperforms all other methods in 7 out of the 12 individual domain shifts and achieves the highest average accuracy 52.7% over the all 12 domain shift experiments. It is worth noticing that our PTELM typically outperforms the other competing methods when amazon serve as source or target domain. We believe the reason is that the domain discrepancy between amazonwebcam and amazondslr are much more significant than other domain shifts, as the larger performance discrepancy between the ELM and ELM in these domain shifts. Therefore, it is obvious that our approach is more effective to deal with large domain shifts.We also visualize the effectiveness of the proposed PTELM via the confusion matrix. Figure
3 plots the confusion matrices of ELM, PTELM and ELM on amazonwebcam domain shift experiment. By inspecting the confusion matrix of ELM, which trained with 20 labeled source samples per class, we find that the source only model is heavily confused about several classes. It also reveals the large domain shift between amazon and webcam and gives explanation for the performance discrepancy between ELM and ELM. On the other hand, the confusion matrix of ELM, which trained with 3 labeled target samples per class, is also somewhat confused. In contrast, as can be seen in Figure 3(b), the offdiagonal elements in confusion matrix are close to zero, which demonstrates that our PTELM method can effectively utilize source and target damain samples together to train a highquality classifier.VC CrossDomain Text Categorization
For the second experiment, we utilize the Multilingual Reuters Collection dataset to evaluate our method in the context of text categorization.
VC1 Experiment Setup
In this dataset, documents written in different languages can be viewed as different domains. We take Spanish as target domain, and other four languages (English, French, German and Italian) as individual source domain. Therefore, there are four combinations in total. For each category, we randomly sample 100 labeled training documents from source domain and labeled training documents from target domain, where 5, 10, 15 and 20, respectively. And the remaining documents in the target domain are used as the test set^{4}^{4}4The splits we used can be downloaded from https://github.com/BoyuanJiang/PTELM/tree/master/DataSplits
. Note that the dimensions of the original TFIDF features are up to 11,547, in order to fairly compare our method with other competing methods, we perform principal components analysis
^{5}^{5}5The PCA uses randomized singular value decomposition algorithm as SVD solver for efficiency.
for dimension reduction and the dimensions after PCA are 40.In this experiment, we also fix , and . The number of hidden nodes is set as instead.
VC2 Results
We report means and standard deviations of all methods on the Multilingual Reuters Collection dataset when and in Table III. It is obvious that our proposed PTELM method beats other competing methods under both settings. It is interesting to note that the GKF algorithm works worse than the ELM and SVM. A possible explanation is that the GFK is put forward for unsupervised domain adaptation, therefore, does not utilize the given target label for training.
We also plot means and standard deviations of all methods over different number of labeled target samples (5, 10, 15 and 20 respectively) in Figure 4 except SVM and ELM, as these two methods perform much worse than the other methods. From the figure, it can be seen that the performance of all the methods is improved with the increase of the number of labeled target samples and our method performs best in most cases. It is worth noting that MMDT performs a little better than our method and much better than other methods when , which demonstrates that MMDT is more suitable when few labeled target samples are available. Besides, another key insight from the figure is that our method is more stable than the competing methods with lower standard deviations.
VD Parameter Sensitivity
In this section, we investigate the sensitivity of four parameters involved in our method, which are three tradeoff parameters and the number of hidden nodes , respectively. Due to space limitation, we only choose amazonwebcam from the OfficeCaltech dataset and ITSP from the Multilingual Reuters Collection dataset to evaluate accuracy. Each time, only one parameter is allowed to change with the other parameters fixed. The results are shown in Figure 5 and we give a brief analysis here. For , it is the tradeoff parameter to balance the contribution of the source and target domain. When is smaller than 1, the model learns more from the source domain. On the contrary, when is larger than 1, the target domain counts more. Therefore, a reasonable value of is close to 1, as can be seen in Figure 5 (a). and are two penalty terms to prevent the model from overfitting the source and target domain data. As can be seen in Figure 5 (b) and (c), the reasonable choices could be and . For the number of hidden nodes , it is highly related to feature dimensions and a reasonable value is about 500 in our experiments.
Vi Conclusion and Future Work
In this paper, we presented a novel approach for parameter transfer under the ELM framework, which explicitly bridges the source domain parameters and the target domain parameters by a projection matrix. In order to select informative source domain features for knowledge transfer, the was applied to the source parameters. Additionally, an effective alternate optimization method was introduced to jointly learn the projection matrix and the model parameters. Experiments on several challenging datasets showed that the proposed PTELM significantly outperforms the nontransfer ELM and SVM by a large margin, besides, achieves better performance than the other representative methods.
In the future, we plan to extend our proposal in the following two aspects. (1) Extending the PTELM to multiple source domain adaptation method. (2) Reformulating the model by transforming the source and target parameters into a shared parameter space by two different projection matrices.
References
 [1] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
 [2] M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer joint matching for unsupervised domain adaptation,” pp. 1410–1417, 2014.
 [3] M. Long, J. Wang, G. Ding, S. J. Pan, and S. Y. Philip, “Adaptation regularization: A general framework for transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 5, pp. 1076–1089, 2014.
 [4] L. Zhang and D. Zhang, “Robust visual knowledge transfer via extreme learning machinebased domain adaptation,” IEEE Transactions on Image Processing, vol. 25, no. 10, pp. 4959–4973, 2016.
 [5] Y. Liu, L. Zhang, P. Deng, and Z. He, “Common subspace learning via crossdomain extreme learning machine,” Cognitive Computation, pp. 1–9, 2017.
 [6] J. Hoffman, E. Rodner, J. Donahue, B. Kulis, and K. Saenko, “Asymmetric and category invariant feature transformations for domain adaptation,” International journal of computer vision, vol. 109, no. 12, pp. 28–41, 2014.
 [7] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” Computer Vision–ECCV 2010, pp. 213–226, 2010.
 [8] B. Gong, Y. Shi, F. Sha, and K. Grauman, “Geodesic flow kernel for unsupervised domain adaptation,” pp. 2066–2073, 2012.
 [9] J. Hoffman, E. Rodner, J. Donahue, T. Darrell, and K. Saenko, “Efficient learning of domaininvariant image representations,” international conference on learning representations, 2013.
 [10] B. Kulis, K. Saenko, and T. Darrell, “What you saw is not what you get: Domain adaptation using asymmetric kernel transforms,” pp. 1785–1792, 2011.

[11]
B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain
adaptation,”
national conference on artificial intelligence
, pp. 2058–2065, 2016.  [12] C. Chen, Z. Chen, B. Jiang, and X. Jin, “Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation,” arXiv preprint arXiv:1808.09347, 2018.
 [13] J. Yang, R. Yan, and A. G. Hauptmann, “Adapting svm classifiers to data with shifted distributions,” pp. 69–76, 2007.
 [14] Y. Aytar and A. Zisserman, “Tabula rasa: Model transfer for object category detection,” pp. 2252–2259, 2011.
 [15] T. Tommasi, F. Orabona, and B. Caputo, “Learning categories from few examples with multi model knowledge transfer,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 5, pp. 928–941, 2014.
 [16] X. Li, W. Mao, and W. Jiang, “Extreme learning machine based transfer learning for data classification,” Neurocomputing, vol. 174, pp. 203–210, 2016.
 [17] G.B. Huang, Q.Y. Zhu, and C.K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1, pp. 489–501, 2006.
 [18] G.B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2, pp. 513–529, 2012.
 [19] G. Huang, S. Song, J. N. Gupta, and C. Wu, “Semisupervised and unsupervised extreme learning machines,” IEEE transactions on cybernetics, vol. 44, no. 12, pp. 2405–2417, 2014.
 [20] C. Chen, X. Jin, B. Jiang, and L. Li, “Optimizing extreme learning machine via generalized hebbian learning and intrinsic plasticity learning,” Neural Processing Letters, pp. 1–17, 2018.
 [21] W. Zong, G.B. Huang, and Y. Chen, “Weighted extreme learning machine for imbalance learning,” Neurocomputing, vol. 101, pp. 229–242, 2013.
 [22] N.Y. Liang, G.B. Huang, P. Saratchandran, and N. Sundararajan, “A fast and accurate online sequential learning algorithm for feedforward networks,” IEEE Transactions on neural networks, vol. 17, no. 6, pp. 1411–1423, 2006.
 [23] H. Zhou, G.B. Huang, Z. Lin, H. Wang, and Y. C. Soh, “Stacked extreme learning machines,” IEEE transactions on cybernetics, vol. 45, no. 9, pp. 2013–2025, 2015.
 [24] G.B. Huang, Z. Bai, L. L. C. Kasun, and C. M. Vong, “Local receptive fields based extreme learning machine,” IEEE Computational Intelligence Magazine, vol. 10, no. 2, pp. 18–29, 2015.
 [25] L. Zhang and D. Zhang, “Domain adaptation extreme learning machines for drift compensation in enose systems,” IEEE Transactions on instrumentation and measurement, vol. 64, no. 7, pp. 1790–1801, 2015.
 [26] M. Uzair and A. Mian, “Blind domain adaptation with augmented extreme learning machine features,” IEEE transactions on cybernetics, vol. 47, no. 3, pp. 651–660, 2017.
 [27] S. M. Salaken, A. Khosravi, T. Nguyen, and S. Nahavandi, “Extreme learning machine based transfer learning algorithms: A survey,” Neurocomputing, vol. 267, pp. 516–524, 2017.
 [28] C. H. Q. Ding, D. Zhou, X. He, and H. Zha, “R1pca: rotational invariant principal component analysis for robust subspace factorization,” pp. 281–288, 2006.
 [29] Q. Gu, Z. Li, and J. Han, “Joint feature selection and subspace learning,” pp. 1294–1299, 2011.
 [30] F. Nie, H. Huang, X. Cai, and C. H. Q. Ding, “Efficient and robust feature selection via joint minimization,” pp. 1813–1821, 2010.
 [31] Y. H. Tsai, Y. Yeh, and Y. F. Wang, “Learning crossdomain landmarks for heterogeneous domain adaptation,” pp. 5081–5090, 2016.
 [32] G. Griffin, A. Holub, and P. Perona, “Caltech256 object category dataset,” 2007.
 [33] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” Computer vision–ECCV 2006, pp. 404–417, 2006.
 [34] M.R. Amini, N. Usunier, and C. Goutte, “Learning from multiple partially observed views  an application to multilingual text categorization,” in NIPS 22, 2009.
 [35] N. Ueffing, M. Simard, S. Larkin, and H. Johnson, “Nrc’s portage system for wmt 2007,” ACL 2007, pp. 185–188, 2007.
Comments
There are no comments yet.