) as a promising nonlinear dimensionality reduction (NDR) method for high-dimensional data manifolds. Its basic assumption is that high-dimensional input data samples lie on or close to a low-dimensional smooth manifold embedded in the ambient Euclidean space. For example, by rotating the camera around the same object with fixed radius, images of the object can be viewed as a one-dimensional curve embedded in a high-dimensional Euclidean space, whose dimension equals to the number of pixels in the image. With the manifold assumption, manifold learning methods aim to extract the intrinsic degrees of freedom underlying the input high-dimensional data samples, by preserving local or global geometric characteristics of the manifold from which data samples are drawn. In recent years, various manifold learning algorithms have been proposed, such as locally linear embedding (LLE)[2, 3], ISOMAP [4, 5], Laplacian eigenmap (LE) , diffusion maps (DM) , local tangent space alignment (LTSA) , and Riemannian manifold learning . They have achieved great success in finding meaningful low-dimensional embeddings for high-dimensional data manifolds. Meanwhile, manifold learning also has many important applications in real-world problems, such as human motion detection 
, human face recognition, classification and compressed expression of hyper-spectral imageries , dynamic shape and appearance classification , and visual tracking [21, 22, 23].
However, a main drawback of the manifold learning methods is that they learn the low-dimensional representations of the high-dimensional input data samples implicitly. No explicit mapping relationship from the input data manifold to the output embedding can be obtained after the training process. Therefore, in order to obtain the low-dimensional representations of the new coming samples, the learning procedure, containing all previous samples and new samples as inputs, has to be repeatedly implemented. It is obvious that such a strategy is extremely time-consuming for sequentially arrived data, which greatly limits the application of the manifold learning methods to many practical problems, such as classification, target detection, visual tracking and detection.
In order to address the issue of lacking explicit mappings, many linear projection based methods have been proposed for manifold learning by assuming that there exists a linear projection between the high-dimensional input data samples and their low-dimensional representations, such as Locality Preserving Projections (LPP) [24, 25], Neighborhood Preserving Embedding (NPE) , Neighborhood Preserving Projections (NPP) , Orthogonal Locality Preserving Projections (OLPP) , Orthogonal Neighborhood Preserving Projections (ONPP) [29, 30], and Graph Embedding . Although these methods have achieved their success in many problems, the linearity assumption may still be too restrictive.
On the other hand, several kernel-based methods have also been proposed to give nonlinear but implicit mappings for manifold learning (see, e.g. [32, 33, 34, 35]). These methods reformulate the manifold learning methods as kernel learning problems and then utilize the existing kernel extrapolation techniques to find the location of new data samples in the low-dimensional space. The mappings provided by the kernel-based methods are nonlinear and implicit. Furthermore, the performance of these methods depends on the choice of the kernel functions, and their computational complexity is extremely high for very large data sets.
In this paper, an explicit nonlinear mapping for manifold learning is proposed for the first time, based on the assumption that there exists a polynomial mapping from the high-dimensional input data samples to their low-dimensional representations. The proposed mapping has the following main features.
The mapping is explicit, so it is straightforward to locate any new data samples in the low-dimensional space. This is different from the traditional manifold learning methods such as like LLE, LE, and ISOMAP  in which the mapping is implicit and it is not clear how new data samples can be embedded in the low-dimensional space. Compared with kernel-based mappings, the proposed mapping does not depend on the specific kernels in finding the low-dimensional representations of new data samples.
The mapping is nonlinear. In contrast to the linear projection-based methods which find a linear projection mapping from the input high-dimensional samples to their low-dimensional representations, the proposed mapping provides a nonlinear polynomial mapping from the input space to the reduced space. Clearly, it is more reasonable to use a polynomial mapping to handle with data samples lying on nonlinear manifolds. Meanwhile, our analysis and experiments show that the proposed mapping is of similar computational complexity with the linear projection-based methods.
Combining this explicit nonlinear mapping with existing manifold learning methods (e.g. LLE, LE, Isomap) can give explicit manifold learning algorithms. In this paper, we concentrate on the LLE manifold learning method and propose an explicit nonlinear manifold learning algorithm called Neighborhood Preserving Polynomial Embedding (NPPE) algorithm. Experiments on both synthetic and real-world data have been conducted to illustrate the validity and effectiveness of the proposed mapping.
The remaining part of the paper is organized as follows. Section 2 gives a brief review of the existing manifold learning methods including those based on linear projections and kernel-based nonlinear mappings. Details of the explicit nonlinear mapping for manifold learning are presented in Section 3, whilst the NPPE algorithm is given in Section 4. In Section 5, experiments are conducted on both synthetic and real-world data sets to demonstrate the validity of the proposed algorithm. Conclusion is given in Section 6.
2 Related Works
In this section, we briefly review existing manifold learning algorithms including those based on linear projections and out-of-sample nonlinear extensions for learned manifolds.
For convenience of presentation, the main notations used in this paper are summarized in Table I
. Throughout this paper, all data samples are in the form of column vectors. Matrices are expressed using normal capital letters and data vectors are represented using lowercase letters. The superscript of a data vector is the index of its component.
|-dimensional Euclidean space where input samples lie|
|-dimensional Euclidean space, , where the|
|low-dimensional embedding lie|
|, the -th input sample in ,|
|, the set of input samples|
|, matrix of input samples|
|, low-dimensional representation|
|of obtained by manifold learning,|
|, the set of low-dimensional|
|, matrix of low-dimensional|
|Identity matrix of size|
|-norm where for an|
2.1 Manifold Learning Methods
According to the geometric characteristics which are preserved, existing manifold learning methods can be cast into two categories: local or global approaches.
preserves local pairwise Euclidean distances among data samples. Maximum Variance Unfolding (MVU) also preserves pairwise Euclidean distances in each local neighborhood, but it maximizes the variance of the low-dimensional representations at the same time. Local Tangent Space Alignment (LTSA)  keeps the local tangent structure. Diffusion Maps  preserves local pairwise diffusion distances from high-dimensional data to the low-dimensional representations. Laplacian Eigenmap (LE)  preserves the local adjacency relationship.
As global approaches, Isometric Feature Mapping (ISOMAP) [4, 5] preserves the pairwise geodesic distances among the high-dimensional data samples and their low-dimensional representations. Hessian Eigenmaps (HLLE)  extends ISOMAP to more general cases where the set of intrinsic degrees of freedom may be non-convex. In Riemannian Manifold Learning (RML) , the coordinates of data samples in the tangential space are preserved to be their low-dimensional representations.
2.2 Linear Projections for Manifold Learning
Manifold learning algorithms based on linear projections assume that there exists a linear projection which maps the high-dimensional samples into a low-dimensional space, that is,
where is a high-dimensional sample and is its low-dimensional representation. Denote by the -th column of . Then from a geometric point of view, data samples in are projected into an -dimensional linear subspace spanned by . The low-dimensional representation is the coordinate of in with respect to the basis .
Locality Preserving Projections (LPP) [24, 25] provides a linear mapping for Laplacian Eigenmaps (LE), by applying (1) into the training procedure of LE. The LE method aims to train a set of low-dimensional representations which can best preserve the adjacency relationship among high-dimensional inputs . If and are “close” to each other, then and should also be so. This property is achieved by solving the following constrained optimization problem
where the penalty weights are given by the heat kernel and .
where , and is the diagonal matrix whose -th entry is
. This optimization problem leads to a generalized eigenvalue problem
and the optimal solutions
are the eigenvectors corresponding to thesmallest eigenvalues.
Once are computed, the linear projection matrix provided by LPP is given by . For any new data sample from the high-dimensional space , LPP finds its low-dimensional representation simply by .
2.2.2 NPP and NPE
The linear projection mapping for Locally Linear Embedding (LLE) is independently provided by Neighborhood Preserving Embedding (NPE)  and Neighborhood Preserving Projections (NPP) . Similarly to LPP, NPE and NPP apply the linear projection assumption (1) to the training process of LLE and reformulate the optimization problem in LLE as to compute the linear projection matrix.
During the training procedure of LLE, a set of linear reconstruction weights are first computed by solving a convex optimization problem
where is the index set of the nearest neighbors of . Then LLE aims to preserve from to . This is achieved by solving the following optimization problem
where with . The optimal solutions are the eigenvectors of the following generalized eigenvalue problem corresponding to the smallest eigenvalues
After finding the linear projection matrix , any new data sample from the high-dimensional space can be easily mapped into the lower dimensional space by .
2.2.3 OLPP and ONPP
Orthogonal Locality Preserving Projections (OLPP)  and Orthogonal Neighborhood Preserving Projections (ONPP) [29, 30] are the same as LPP and NPE (or NPP), respectively, except that the linear projection matrix provided by LPP and NPE (or NPP) is restricted to be orthogonal. This is achieved by replacing the constraints (5) and (9) with . Then the optimization problems in OLPP and ONPP become
Unlike in the cases of LPP and NPE (or NPP), these two optimization problems lead to eigenvalue problems which are much easier to solve numerically than a generalized eigenvalue problem. The column vectors of are given by the eigenvectors of corresponding to the smallest eigenvalues. The same result holds for by replacing with . The reader is referred to  and [29, 30] for details of these two algorithms.
2.3 Out-of-Sample Nonlinear Extensions for Manifold Learning
Besides linear projections for manifold learning, several out-of-sample nonlinear extensions are also proposed for manifold learning in order to get low-dimensional representations of unseen data samples from the learned manifold. These methods are based on kernel functions and extrapolation techniques. A common strategy taken by these methods is to reformulate manifold learning methods as kernel learning problems. Then extrapolation techniques are employed to find the location of new coming samples in the low-dimensional space from the learned manifold. Bengio et al. [32, 36]
proposed a unified framework for extending LLE, ISOMAP and LE, in which these methods are seen as learning eigenfunctions of operators defined from data-dependent kernels. The data-dependent kernels are implicitly defined by LLE, ISOMAP LE and are used together with the Nyström formula to extrapolate the embedding of a manifold learned from finite training samples to new coming samples for LLE, ISOMAP and LE (see [32, 36]). Chin and Suter 
investigated the equivalence between MVU and Kernel Principal Component Analysis (KPCA), by which extending MVU to new samples is reduced to extending a kernel matrix. In their work , the kernel matrix is generated from an unknown kernel eigenfunction which is approximated using Gaussian basis functions. A framework was proposed in  for efficient kernel extrapolation which is based on a matrix approximation theorem and an extension of the representer theorem. Under this framework, LLE was reformulated and the issue of extending LLE to new data samples was addressed in .
3 Explicit Nonlinear Mappings for Manifold Learning
In this section, we propose an explicit nonlinear mapping for manifold learning, based on the assumption that there is a polynomial mapping between the high-dimensional data samples and their lower dimensional representations. Precisely, given input samples and their low dimensional representations , we assume that there exists a polynomial mapping which maps to , that is, the -th component of is a polynomial of degree with respect to in the following manner:
where are all integers. The superscript stands for the -tuple indexing array and is the vector of polynomial coefficients which is defined by
By assuming the polynomial mapping relationship, we aim to find a polynomial approximation to the unknown mapping from the high-dimensional data samples into their low-dimensional embedding space. Compared with the linear projection assumption used previously, a polynomial mapping provides high-order approximation to the unknown nonlinear mapping and therefore is more accurate for data samples lying on nonlinear manifolds.
In order to apply this explicit nonlinear mapping to manifold learning algorithms, we need two definitions from matrix analysis .
The Kronecker product of an matrix and a matrix is defined as
which is an matrix.
The Hadamard product of two matrices and is defined as
Recently, it was proved in  that most manifold learning methods, including LLE, LE, and ISOMAP, can be cast into the framework of spectral embedding. Under this framework, finding the low-dimensional embedding representations of the high-dimensional data samples is reduced to solving the following optimization problem
where , , are positive weights which can be defined by using the input data samples and .
Applying the polynomial assumption (12) to the above general model of manifold learning gives a general manifold learning algorithm with an explicit nonlinear mapping. Denote by and substitute (12) into (14). Then the objective function becomes
This is equivalent to
where for and otherwise.
where and is a diagonal matrix whose -th diagonal entry is .
By the Rayleigh-Ritz Theorem , the optimal solutions are the eigenvectors of the following generalized eigenvalue problem corresponding to the smallest eigenvalues
Once are computed, the explicit nonlinear mapping from the high-dimensional data samples to the low-dimensional embedding space can be given as
where is a high-dimensional data sample and is its low-dimensional representation. For a new coming sample , its location in the low-dimensional embedding manifold can be simply obtained by
where is defined in the same way as in (18).
In the next section, we will make use of a similar method as in LLE to define the weights , so that the geometry of the neighborhood of each data point can be captured.
4 Neighborhood Preserving Polynomial Embedding
In this section, we propose a new manifold learning algorithm with an explicit nonlinear mapping, named Neighborhood Preserving Polynomial Embedding (NPPE), which is obtained by defining the weights , in a way similar to the LLE method and combining them with the explicit nonlinear mapping as in the preceding Section 3.
Consider a data set from the high-dimensional space NPPE starts with finding a set of linear reconstruction weights which can best reconstruct each data point by its -nearest neighbors (k-NNs). This step is identical with that of LLE [2, 3]. The weights , which are defined to be nonzero only if is among the -NNs of , are computed by solving the following optimization problem
The weights represent the linear coefficients for reconstructing the sample from its neighbors , whilst the constraint means that is approximated by a convex combination of its neighbors. The weight matrix, , has a closed-form solution given by
where is a column vector formed by the non-zero entries in the -th row of and is a column vector of all ones. The -th entry of the matrix is , where and are among the -NNs of .
NPPE aims to preserve the reconstruction weights from the high-dimensional input data samples to their low-dimensional representations under the polynomial mapping assumption. This is achieved by solving the following optimization problem
where each satisfies (12).
By the result in Section 3, the explicit nonlinear mapping can be obtained by solving (23) and the low-dimensional representations of can be computed by applying (24) to . For a new coming sample , its low-dimensional representation can be simply given by (25).
We conclude this section by summarizing the NPPE algorithm in Algorithm 1.
4.2 Computational Complexity and Simplified NPPE
In the training procedure of NPPE, the computational complexity of generating is . Computing and takes and operations, respectively, since there are only non-zero entries in each column of and is a diagonal matrix. The computational complexity of the final eigen-decomposition is , which is the most time-consuming step.
In the procedure of locating new samples with NPPE, generating takes operations and computing takes operations.
From the above analysis, it can be seen that, as the polynomial order increases, the overall computational complexity increases exponentially with , which would be extremely time-consuming when the data dimension is very high. To address this issue, we simplify NPPE by removing the crosswise items. This is achieved by replacing the Kronecker product in (18) with the Hadamard product
With this strategy, the computational complexity of generating is reduced to , whilst the computational complexity computing is reduced to . The Simplified NPPE (SNPPE) is summarized in Algorithm 2.
Finally, the computational complexity of SNPPE, linear methods and kernel methods on computing is summarized in Table II. The computational complexity of different kernel methods varies. Here we only state the computational complexity of the common step of computing the inner products. It is obvious that the total complexity in computing is not less than this value.
In this subsection, we briefly explain why NPPE or SNPPE has a better performance than its linear counterparts for nonlinearly distributed data sets.
Let be a nonlinear map from a manifold to such that , where is at least th-order differentiable. For simplicity, and without loss of generality we may assume that and that . Then the Taylor expansion of at zero is given by
where and are the gradient and Hessian of , respectively. From (31), it can be seen that the linear methods only use the first-order approximation provided by to approximate the nonlinear mapping , while the proposed polynomial mapping contains the extra high-order terms. Therefore, the explicit nonlinear mapping based on the polynomial assumption gives a better approximation to the true nonlinear mapping than the explicit linear one.
5 Experimental Tests
In this section, experiments on both synthetic and real world data sets are conducted to illustrate the validity and effectiveness of the proposed NPPE algorithm. In Section 5.1, NPPE is tested on recovering geometric structures of surfaces embedded in . In Section 5.2, NPPE is applied to locating new coming data samples in the learned low-dimensional space. In Section 5.3, NPPE is used to extract intrinsic degrees of freedom underlying two image manifolds. In the experiments, the simplified version of NPPE is implemented and compared with NPP  and ONPP  (which apply the linear and orthogonal linear projection mapping to the training procedure for LLE, respectively) as well as the kernel extrapolation (KE) method proposed in .
There are two parameters in the NPPE algorithm, the number of nearest neighbors and the polynomial degree . is usually set to be of the number of training samples, and the experimental tests show that NPPE is stable around this number. The choice of depends on the dimension . When is small, can be large to make NPPE more accurate. When is large, should be small to make NPPE computationally efficient. Experiments show that NPPE with is already accurate enough.
5.1 Learning Surfaces in with NPPE
In the first experiment, NPPE, NPP, ONPP and LLE are applied to the task of unfolding surfaces embedded in . The surfaces are the SwissRoll, SwissHole, and Gaussian, all of which are generated by the Matlab Demo available at http://www.math.umn.edu/~wittman/mani/. On each manifold, data samples are randomly generated for training. The number of nearest neighbors is and the polynomial degree . The experimental results are shown in Fig. 1. In each sub-figure, stands for the generating data such that , where is the nonlinear mapping that embeds in . It can be seen from Fig. 1 that NPPE outperforms all the other three methods, even the LLE method itself. NPP and ONPP fail to unfold these nonlinear manifolds (except for ONPP on Gaussian).
Furthermore, in order to estimate the similarity between the learned low-dimensional representations and the generating data, the residual variance is computed, where is the standard linear correlation coefficient taken over all entries of and . The lower is, the more similar and are. The estimation results are shown in Fig. 1(d). It can be seen that the embedding given by NPPE is the most similar one.
5.2 Locating New Data Samples with NPPE
In the second experiment, we apply NPPE, NPP, ONPP and KE to locating new coming samples in the learned low-dimensional space. First, data samples which evenly distribute on the SwissRoll manifold are generated. Then samples are randomly selected as the training data to learn the mapping relationship from to by NPPE, NPP, ONPP and KE. The learned mappings are used to provide the low-dimensional representations for the rest samples. The time cost of computing the low-dimensional representations of the testing samples is also recorded. Experimental results are shown in Fig. 2. It can be seen that NPPE not only gives the best locating result but also has much lower time cost than KE. NPP and ONPP are faster for computation but fail to give the correct embedding result. The same experiment is also conducted on data samples randomly selected from SwissRoll. The results are shown in Fig. 3. NPPE also outperforms the other three methods.
To further validate the performance of NPPE, we randomly generate samples on the SwissRoll manifold, for training and for testing. The experimental procedure is just the same as the preceding one. Time cost versus number of testing samples is shown in Fig. 4(a). The residual variances between the generating data of the testing samples and their low-dimensional representations given by the four methods, are illustrated in Fig. 4(b). The experimental results show that NPPE is more accurate than all the other three methods with a similar computational cost with NPP and ONPP. Note that, in all the above experiments, the time cost of KE is increasing linearly with the number of testing samples increasing, whilst that of NPP, ONPP and NPPE is almost the same with the increase of the number of testing samples.
5.3 Learning Image Manifolds with NPPE
In the last experiment, NPPE is applied to extract intrinsic degrees of freedom underlying two image manifolds, the lleface  and usps-0.
The lleface consists of face images of the same person at resolution , and the two intrinsic degrees of freedom underlying the face images are rotation of the head and facial emotion. We randomly select samples as the training data and samples as the testing data. The number of nearest neighbors is set to be . The experimental results are shown in Fig. 5. The training and testing results are shown on the left and right columns, respectively, in Fig. 5. training samples and testing samples are randomly selected and attached to the learned embedding. It can be seen that NPPE and NPP have successfully recovered the underlying structure of lleface, while the result given by KE is not satisfactory. The rotation degree is not extracted by the learned embedding with KE. Time cost on locating new data samples by these three methods is shown in Fig. 7(a). The time cost of NPPE is higher than that of NPP but lower than that of KE, which supports the analysis of computational complexity in Section 4.2.
The usps-0 data set consists of images of handwritten digit ‘0’ at resolution , and the two underlying intrinsic degrees of freedom are the line width and the shape of ‘0’. samples are randomly selected as training data and samples are chosen to be testing data. The number of nearest neighbors is set to be . Fig. 6 illustrates the experimental results. Training and testing results are shown on the left and right columns, respectively. training samples and testing samples are randomly selected and shown in the learned embedding. It can be seen that NPPE has successfully recovered the underlying structure, while it is hard to see the changes of line width and shape in the embedding given by KE and ONPP. Time cost on locating new data samples by these three methods is shown in Fig. 7(b). The time cost of NPPE is higher than ONPP but much lower than KE.
In this paper, an explicit nonlinear mapping for manifold learning is proposed for the first time. Based on the assumption that there is a polynomial mapping from the high-dimensional input samples to their low-dimensional representations, an explicit polynomial mapping is obtained by applying this assumption to a generic model of manifold learning. Furthermore, the NPPE algorithm is a nonlinear dimensionality reduction technique with a explicit nonlinear mapping, which tends to preserve not only the locality but also the nonlinear geometry of the high-dimensional data samples. NPPE can provide convincing embedding results and locate new coming data samples in the reduced low-dimensional space simply and quickly at the same time. Experimental tests on both synthetic and real-world data have validated the effectiveness of the proposed NPPE algorithm.
Experiment on locating new samples for uniformly distributedSwissRoll data. (a) Training data and their generating data. (b) Time cost versus number of testing samples. (c) Locating results by NPPE. (d) Locating results by NPP. (e) Locating results by ONPP. (f) Locating results by KE. In (c)-(f), stands for the training result.
-  H.S. Seung and D.D. Lee, “The manifold ways of perception,” Science, vol. 290, no. 5500, pp. 2268-2269, Dec. 2000.
-  S.T. Roweis and L.K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323-2326, Dec. 2000.
L.K. Saul and S.T. Roweis, “Think globally, fit locally: unsupervised learning of low dimensional manifold,”J. Machine Learning Research, vol. 4, pp. 119-155, 2003.
-  J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319-2323, Dec. 2000.
-  V. de Silva and J. Tenenbaum, “Global versus local methods in nonlinear dimensionality reduction,” Proc. Advances in Neural Information Processing Systems, vol. 15, pp. 705-712, 2003.
-  I.T. Jolliffe, Principal Component Analysis. Springer, 1989.
-  M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.
-  T. Cox and M. Cox, Multidimensional Scaling. Chapman and Hall, 1994.
-  L. Yang, “Alignment of overlapping locally scaled patches for multidimensional scaling and dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 3, pp. 438-490, Mar, 2008.
K.Q. Weinberger, L.K. Saul, “Unsupervised learning of image manifolds by semidefinite
Int. J. Comput. Vision, vol. 70, pp. 77-90, 2006.
-  Z. Zhang and H. Zha, “Principal manifolds and nonlinear dimension reduction via local tangent space alignment,” SIAM J. Sci. Comput., vol. 26, no. 1, pp. 313-338, 2004.
-  M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput.. vol. 15, no. 6, pp. 1373-1396, Jun. 2003.
-  T. Lin, and H. Zha, “Riemannian manifold learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 5, pp. 796-809, May. 2008.
-  R.R. Coifman, and S. Lafon, “Diffusion maps,” Appl. Comput. Harmonic Anal., vol. 21, pp. 5-30, 2006.
-  D. Donoho, and C. Grimes, “Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data,” Proc. Nat. Acad. Sci., vol. 100, pp. 5591-5596, 2003.
-  J. Lee, and M. Verleysen, Nolinear Dimensionality Reduction, Springer, 2007.
-  L. Wang and D. Suter, “Learning and matching of dynamics shape manifolds for human action recognition,” IEEE Trans. Image Process., vol. 16, no. 6, pp. 1646-1661, Jun. 2007.
-  J. Chen, R. Wang, S. Yan, S. Shan, X. Chen, and W. Gao, “Enhancing human face detection by resampling examples through manifolds,” IEEE Trans. Syst. Man Cybern. Part A, vol. 37, no. 6, pp. 1017-1028, Nov. 2007.
-  C.M. Bachmann, T.L. Ainsworth, and R.A. Fusina, “Exploiting manifold geometry in hyperspectral imagery,” IEEE Trans. Geosci. Remote Sensing, vol. 43, no. 3, pp. 441-454, Mar. 2005.
-  A. Elgammal and C.S. Lee, “Nonlinear manifold learning for dynamic shape and dynamic appearance,” Comput. Vis. Image Underst., vol. 106, no. 1, pp. 31-46, Apr. 2007.
-  Q. Wang, G. Xu, and H. Ai, “Learning object intrinsic structure for robust visual tracking,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., 2003, vol. 2, pp. 227-233.
-  H. Qiao, P. Zhang, B. Zhang, and S. Zheng, “Learning an intrinsic variable preserving manifold for dynamic visual tracking”, IEEE. Trans. Syst. Man. Cybern. Part B, in press, 2009.
H. Qiao, P. Zhang, B. Zhang, and S. Zheng, “Tracking feature extraction based on manifold learning framework”,J. Exp. Theor. Artif. Intell., in press, 2009.
-  X. He and P. Niyogi, “Locality preserving projections”, in Proc. Advances Neural Inf. Process. Syst., 2003.
-  X. He, S. Yan, Y. Hu, P. Niyogi, and H.J. Zhang, “Face recognition using Laplacianfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 3, pp. 328-340, Mar. 2005.
-  X. He, D. Cai, S. Yan, H.J. Zhang, “Neighborhood preserving embedding,” in: Proc. IEEE Int. Conf. Comput. Vis., 2005, vol. 2, pp. 1208-1213.
-  Y. Pang, L. Zhang, Z. Liu, N. Yu, and H. Li, “Neighborhood preserving projections (NPP): A novel linear dimension reduction method”, in Proc. ICIC (1), 2005, pp.117-125.
-  D. Cai, X. He, J. Han, and H.J. Zhang, “Orthogonal Laplacianfaces for face recognition,”, IEEE Trans. Image Prcess., vol. 15, no. 11, pp. 3608-3614, Nov. 2006.
-  E. Kokiopoulou, and Y. Saad, “Orthogonal neighborhood preserving projections”, Proc. Fifth IEEE Int’l Conf. Data Mining, Nov. 2005.
-  E. Kokiopoulou, and Y. Saad, “Orthogonal neighborhood preserving projections: A projection-based dimensionality reduction technique”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 12, Dec. 2007.
-  S. Yan, D. Xu, B.Y. Zhang, H.J. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: a general framework for dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp. 40-51, Jan. 2007.
Y. Bengio, J.F. Paiement, P. Vincent, O. Delalleau, N.L. Roux, and M. Ouimet, “Out-of-sample extions for LLE, Isomap, MDS, Eigenmaps and spectral clustering,” inProc. Advances Neural Inf. Process. Syst., 2003, vol. 16, pp. 177-184.
-  S.V.N. Vishwanathana, K.M. Borgwardtc, O. Guttmana, and A. Smola, “Kernel extrapolation,” Neurocomputing, vol. 69, no. 7-9, pp. 721-729, Mar. 2006.
-  M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: a geometric framework for learning from labelled and unlabelled examples,” J. Mach. Learn. Res., vol. 7, pp. 2399-2434, Dec. 2006.
-  T. Chin, and D. Suter, “Out-of-sample extrapolation of learned manifolds”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 9, Sep. 2008.
-  Y. Bengio, O. Delalleau, N. Le Roux, J.-F. Paiement, P. Vincent, and M. Ouimet, “Learning eigenfunctions links spectral embedding and kernel PCA,” Neural Computation, vol. 16, no. 10, pp. 2197-2219, 2004.
-  M. Law, and A. Jain, “Incremental nonlinear dimensionality reduction by manifold learning”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no.3, Mar, 2006.
-  C. Baker, The Numerical Treatment of Intergral Equations, Clarendon Press, Oxford, 1977.
-  B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem”, Neural Computaion, vol. 16, no. 10, pp. 1299-1319, 1998.
-  J.R. Magnus, and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrices, Revised Ed., Wiley, 1999.