1 Introduction
Highdimensional data is inherently difficult to explore and analyze, owing to the “curse of dimensionality” that render many statistical and machine learning techniques inadequate. In this context,
nonlinear spectral dimensionality reduction (NLSDR) has proved to be an indispensable tool. Manifold learning based NLSDR methods, such as Isomap[1], Local Linear Embedding (LLE)[2], etc., assume that the distribution of the data in the highdimensional observed space is not uniform and in reality, the data lies near a nonlinear lowdimensional manifold embedded in the highdimensional space.If directly applied on streaming data, NLSDR methods have to recompute the entire manifold each time a new point is extracted from a stream. This quickly becomes computationally prohibitive but guarantees the best possible quality of the learned manifold given the data. To alleviate the computational problem, landmark [3] or general outofsample extension methods [4] have been proposed. These techniques are still computationally expensive for practical applications. Recent streaming adaptations of NLSDR methods have relied on exact learning from a smaller batch of observations followed by approximate mapping of subsequent stream of observations [5]. Extensions to cases when the observations are sampled from multiple and possibly intersecting manifolds have been proposed as well [6].
However, existing streaming manifold learning methods [5, 6] assume that the underlying generative distribution is stationary over the stream, and are unable to detect when the distribution “drifts” or abruptly “shifts” away from the base, resulting in incorrect lowdimensional mappings (See Fig. 1). We develop a methodology to identify such changes (drifts and shifts) in the stream properties and inform the streaming algorithm to update the base model.
We employ a Gaussian Process (GP) [7] based adaptation of Isomap [1], a widely used NLSDR method, to process high throughput streams. The use of GP is enabled by a novel kernel that measures the relationship between a pair of observations along the manifold, and not in the original highdimensional space. We prove that the lowdimensional representations inferred using the GP based method, GPIsomap, are equivalent to the representations obtained using the stateofart streaming Isomap methods [5, 6]. Additionally, we empirically show, on synthetic and real datasets, that the predictive variance associated with the GP predictions is an effective indicator of the changes (either gradual drifts or sudden shifts) in the underlying generative distribution, and can be employed to inform the algorithm to “relearn” the core manifold.
2 Problem Statement and Preliminaries
We first formulate the NLSDR problem and provide background on Isomap and discuss its outofsample and streaming extensions [8, 5, 6, 9]. Additionally, we provide brief introduction to Gaussian Process (GP) analysis.
2.1 Nonlinear Spectral Dimensionality Reduction
Given highdimensional data , where , the NLSDR problem is concerned with finding its corresponding lowdimensional representation , such that , where .
NLSDR methods assume that the data lies along a lowdimensional manifold embedded in a highdimensional space, and exploit the global (Isomap [1], Minimum Volume Embedding [10]) or local (LLE [2], Laplacian Eigenmaps [11]) properties of the manifold to map each to its corresponding .
The Isomap algorithm [1] maps each to its lowdimensional representation in such a way that the geodesic distance along the manifold between any two points, and , is as close to the Euclidean distance between and as possible. The geodesic distance is approximated by computing the shortest path between the two points using the nearest neighbor graph and is stored in the geodesic distance matrix , where is the geodesic distance between the points and . contains squared geodesic distance values. The Isomap algorithm recovers by using the classical Multi Dimensional Scaling (MDS) on . Let be the inner product matrix between different . can be retrieved as by assuming , where and , where is the Kronecker delta. Isomap uncovers such that is as close to as possible. This is achieved by where are the
largest eigenvalues of
andare the corresponding eigenvectors.
To measure error between the true, underlying lowdimensional representation to that uncovered by NLSDR methods, Procrustes analysis [12] is typically used. Procrustes analysis involves aligning two matrices, and , by finding the optimal translation , rotation , and scaling that minimizes the Frobenius norm between the two aligned matrices, i.e.,:
The above optimization problem has a closed form solution obtained by performing Singular Value Decomposition (SVD) of
[12]. Consequently, one of the properties of Procrustes analysis is that when i.e. when one of the matrices is a scaled, translated and/or rotated version of the other, which we leverage upon in this work.2.2 Streaming Isomap
Given that the Isomap algorithm has a complexity of where is the size of the data, recomputing the manifold is computationally too expensive or impractical to use in a streaming setting. Incremental techniques have been proposed in the past [9, 5], which can efficiently process the new streaming points, without affecting the quality of the embedding significantly.
The SIsomap algorithm relies on the observation that a stable manifold can be learnt using only a fraction of the stream (denoted as the batch dataset ), and the remaining part of stream (denoted as the stream dataset ) can be mapped to the manifold in a significantly less costly manner. This can be justified by considering the convergence of eigenvectors and eigenvalues of , as the number of points in the batch increase [13]. In particular, the bounds on the convergence error for a similar NLSDR method, i.e., kernel PCA, is shown to be inversely proportional to the batch size [13]. Similar arguments can be made for Isomap, by considering the equivalence between Isomap and Kernel PCA [14, 8]. This relationship has also been empirically shown for multiple data sets [5].
The SIsomap algorithm computes the lowdimensional representation for each new point i.e. by solving a leastsquares problem formulated by matching the dot product of the new point with the lowdimensional embedding of the points in the batch dataset
, computed using Isomap, to the normalized squared geodesic distances vector
. The leastsquares problem has the following form:where^{1}^{1}1Note that the Incremental Isomap algorithm [9] has a slightly different formulation where
(2) 
2.3 Handling Multiple Manifolds
In the ideal case, when manifolds are densely sampled and sufficiently separated, clustering can be performed before applying NLSDR techniques [16, 17], by choosing an appropriate local neighborhood size so as not to include points from other manifolds and still be able to capture the local geometry of the manifold. However, if the manifolds are close or intersecting, such methods typically fail.
The SIsomap++ [6] algorithm overcomes limitations of the SIsomap algorithm and extends it to be able to deal with multiple manifolds. It uses the notion of Multiscale SVD [29] to define tangent manifold planes at each data point, computed at the appropriate scale, and compute similarity in a local neighborhood. Additionally, it includes a novel manifold tangent clustering algorithm to be able to deal with the above issue of clustering manifolds which are close and in certain scenarios, intersecting, using these tangent manifold planes. After initially clustering the highdimensional batch dataset, the algorithm applies NLSDR on each manifold individually and eventually “stitches” them together in a global ambient space by defining transformations which can map points from the individual lowdimensional manifolds to the global space. However, SIsomap++ can only detect manifolds which it encounters in its batch learning phase and not those which it might encounter in the streaming phase.
2.4 Gaussian Process Regression
Let us assume that we are learning a probabilistic regression model to obtain the prediction at a given test input, , using a nonlinear and latent function, . Assuming^{2}^{2}2For vectorvalued outputs, i.e., , one can consider independent models. , the observed output, , is related to the input as:
(3) 
Given a training set of inputs, and corresponding outputs, , the Gaussian Process Regression (GPR) model assumes a GP prior on the latent function values, i.e., , where is the mean of and is the covariance between any two evaluations of , i.e, and . Here we use a zeromean function (), though other functions could be used as well. The GP prior states that any finite collection of the latent function evaluations are jointly Gaussian, i.e.,
(4) 
where the entry of the covariance matrix, , is given by . The GPR model uses (3) and (4) to obtain the predictive distribution at a new test input,
, as a Gaussian distribution with following mean and variance:
(5)  
(6) 
where is a vector with value as .
The kernel function, , specifies the covariance between function values, and , as a function of the corresponding inputs, and . A popular choice is the squared exponential kernel, which has been used in this work:
(7) 
where is the signal variance and is the length scale. The quantities , , and (from (3
)) are the hyperparameters of the model and can be estimated by maximizing the marginal loglikelihood of the observed data (
and ) under the GP prior assumption.One can observe that predictive mean, in (5) can be written as an inner product, i.e.,:
(8) 
where . We will utilize this form in subsequent proofs.
3 Methodology
The proposed GPIsomap algorithm follows a twophase strategy (similar to the SIsomap and SIsomap++), where exact manifolds are learnt from an initial batch , and subsequently a computationally inexpensive mapping procedure processes the remainder of the stream. To handle multiple manifolds, the batch data is first clustered via manifold tangent clustering or other standard techniques. Exact Isomap is applied on each cluster. The resulting lowdimensional data for the clusters is then “stitched” together to obtain the lowdimensional representation of the input data. The difference from the past methods is the mapping procedure which uses GPR to obtain the predictions for the lowdimensional mapping (See (5)). At the same time, the associated predictive variance (See (6)) is used to detect changes in the underlying distribution.
The overall GPIsomap algorithm is outlined in Algorithm 1 and takes a batch data set, and the streaming data, as inputs, along with other parameters. The processing is split into two phases: a batch learning phase (Lines 1–15) and a streaming phase (Lines 16–32), which are described later in this section.
3.1 Kernel Function
The proposed GPIsomap algorithm uses a novel geodesic distance based kernel function defined as:
(9) 
where is the entry of the normalized geodesic distance matrix, , as discussed in Sec. 2.1, is the signal variance (whose value we fix as 1 in this work) and is the length scale hyperparameter. Thus the kernel matrix can be written as:
(10) 
This kernel function plays a key role in using the GPR model for mapping streaming points on the learnt manifold, by measuring similarity along the lowdimensional manifold, instead of the original space (), as is typically done in GPR based solutions.
The matrix, , is positive semidefinite^{3}^{3}3Actually is not always guaranteed to be PSD. [25] use a additive constant to make it PSD. Equation (11) via exponentiation introduces the identity which functions similarly to an additive constant. (PSD), due to the double mean centering done to squared geodesic distance matrix . Consequently, we note that the kernel matrix, , is positive definite (refer (11) below).
3.2 Batch Learning
The batch learning phase consists of these tasks :
3.2.1 Clustering.
The first step in the batch phase involves clustering of the batch dataset into individual clusters which represent the manifolds. In case, contains a single cluster, the algorithm can correctly detect it. (Line 1)
3.2.2 Dimension Reduction.
Subsequently, full Isomap is executed on each of the individual clusters to get lowdimensional representations of the data points belonging to each individual cluster. (Lines 3–5)
3.2.3 Hyperparameter Estimation.
The geodesic distance matrix for the points in the ^{th} manifold and the corresponding lowdimensional representation , are fed to the GP model for each of the manifolds, to perform hyperparameter estimation, which outputs . (Lines 6–8)
3.2.4 Learning Mapping to Global Space.
The lowdimensional embedding uncovered for each of the manifolds can be of different dimensionalities. Consequently, a mapping to a unified global space is needed. To learn this mapping, a support set is formulated, which contains the pairs of nearest points and pairs of farthest points, between each pair of manifolds. Subsequently, MDS is executed on this support set to uncover its lowdimensional representation . Individual scaling and translation factors are learnt via solving a least squares problem involving , which map points from each of the individual manifolds to the global space. (Lines 9–15)
3.3 Stream Processing
In the streaming phase, each sample in the stream set is embedded using each of the GP models to evaluate the prediction , along with the variance (Lines 22–24). The manifold with the smallest variance get chosen to embed the sample into, using the corresponding scaling and translation factor , provided is within the allowed threshold (Lines 25–28), otherwise sample is added to the unassigned set (Lines 29–31). When the size of unassigned set exceeds certain threshold , we add them to the batch dataset and relearn the base manifold (Line 18–20). The assimilation of the new points in the batch maybe done more efficiently in an incremental manner.
3.4 Complexity
The runtime complexity of our proposed algorithm is dominated by the GP regression step as well as the Isomap execution step, both of which have complexity, where is the size of the batch dataset . This is similar to the SIsomap and SIsomap++ algorithms, that also have a runtime complexity of . The stream processing step is for each incoming streaming point. The space complexity of GPIsomap is dominated by . This is because each of the samples of the stream set get processed separately. Thus, the space requirement as well as runtime complexity does not grow with the size of the stream, which makes the algorithm appealing for handling highvolume streams.
4 Theoretical Analysis
In this section, we first state the main result and subsequently prove it using results from lemmas stated later in Appendix 0.A.
Theorem 4.1.
The prediction _{GP} of our proposed approach, GPIsomap is equivalent to the prediction _{ISO} of SIsomap i.e. the Procrustes Error _{Proc}_{GP}, _{ISO} between _{GP} and _{ISO} is .
Proof.
The prediction of GPIsomap is given by (8). Using Lemma 6, we demonstrated that
(12) 
The term for GPIsomap, using our novel kernel function evaluates to
(13) 
where represents the vector containing the squared geodesic distances of to containing .
Considering the above equation elementwise, we have that the ^{th} term of equates to . Using Taylor’s series expansion we have,
(14) 
Rewriting (2) we have,
(16) 
where is a constant with respect to , since it depends only on squared geodesic distance values associated within the batch dataset and is part of the stream dataset .
We now consider the ^{st} dimension of the predictions for GPIsomap and SIsomap only and demonstrate their equivalence via Procrustes Error. The analysis for the remaining dimensions follows a similar line of reasoning.
Thus for the ^{st} dimension, using (16) the SIsomap prediction is
(17) 
Similarly using Lemma 6, (13) and (14), we have that the ^{st} dimension for GPIsomap prediction is given by,
(18) 
We can observe that is a scaled and translated version of . Similarly for each of the dimensions (), the prediction for the GPIsomap can be shown to be a scaled and translated version of the prediction for the SIsomap . These individual scaling and translation factors can be represented together by single collective scaling and translation factors. Consequently, the Procrustes Error _{Proc}_{GP}, _{SI} is 0. (refer Sect. 2.1). ∎
5 Results and Analysis
In this section^{4}^{4}4All synthetic datasets (refer Fig.1), figures and code are available here. we demonstrate the ability of the predictive variance within GPIsomap to identify changes in the underlying distribution in the data stream on synthetically generated datasets as well as on benchmark sensor data sets.
5.1 Results on Synthetic Data Sets
Swiss roll datasets are typically used for evaluating manifold learning algorithms. To evaluate our method on sudden conceptdrift, we use the Euler Isometric Swiss Roll dataset [5] consisting of four Gaussian patches having points from each patch, chosen at random, which are embedded into using a nonlinear function . The points for each of the Gaussian modes are divided equally into training and test sets. To test incremental conceptdrift, we use the single patch dataset, which consists of a single patch borrowed from the above, which we use as the training data, along with a uniform distribution of points for testing. Figures 1, 2 and 3 demonstrates our results on these datasets.
5.1.1 Gaussian patches on Isometric Swiss Roll
To evaluate our method on sudden conceptdrift, we trained our GPIsomap model using 3 out of 4 training sets of the 4 patches dataset. Subsequently we stream points randomly from the test sets of only these 3 classes initially and later stream points from the test set of the fourth class, keeping track of the predictive variance all the while. Fig. 2 demonstrates the sudden increase (see red line) in the variance of the stream when streaming points are from the fourth class i.e. unknown mode. Thus GPIsomap is able to detect conceptdrift correctly. The bottom panel of Fig. 1 demonstrates the performance of SIsomap++ on this dataset. It fails to map the streaming points of the unknown mode correctly, given it had not encountered the unknown mode in its batch training phase.
To test our proposed approach for detecting incremental concept drift, we train our model using the single patch dataset and subsequently observe how the variance of the stream behaves on the test streaming dataset. The top panel of Fig. 1 shows how gradually variance increases smoothly as the stream gradually drifts away from the Gaussian patch. This shows that GPIsomap maps incremental drift correctly. In Sect. 4, we proved the equivalence between the prediction of SIsomap with that of GPIsomap, using our novel kernel. In Fig. 3, we show empirically via Procrustes Error (PE) that indeed the prediction of SIsomap matches that of GPIsomap, irrespective of size of batch used. PE for GPIsomap with the Euclidean distance based kernel remains high irrespective of the size of the batch, which clearly demonstrates the unsuitability of this kernel to adequately learn mappings in the lowdimensional space.
5.2 Results on Sensor Data Set
The Gas Sensor Array Drift (GSAD) [18] is a benchmark dataset () available to research communities to develop strategies to dealing with concept drift and uses measurements from 16 chemical sensors used to discriminate between 6 gases (class labels) at various concentrations. We demonstrate the performance of our proposed method on this dataset.
The data was first mean normalized. Data points from the first five classes were divided into training and test sets. We train our model using the training data from four out of these five classes. While testing, we stream points randomly from the test sets of these four classes first and later stream points from the test set of the fifth class. Figure 4 demonstrates our results. Our model can clearly detect conceptdrift due to the unknown fifth class by tracking the variance of the stream, using the running average (red line).
6 Related Works
Processing data streams efficiently using standard approaches is challenging in general, given streams require realtime processing and cannot be stored permanently. Any form of analysis, including detecting conceptdrift requires adequate summarization which can deal with the inherent constraints and that can approximate the characteristics of the stream well. Sampling based strategies include random sampling [19, 20]
as well as decisiontree based approaches
[21] which have been used in this context. To identify conceptdrift, maintaining statistical summaries on a streaming “window” is a typical strategy [22, 23, 24]. However, none of these are applicable in the setting of learning a latent representation from the data, e.g., manifolds, in the presence of changes in the stream distribution.We discuss limitations of existing incremental and streaming solutions that have been specifically developed in the context of manifold learning, specifically in the context of the Isomap algorithm in Section 2. Coupling Isomap with GP Regression (GPR) has been explored in the past [25, 26], though not in the context of streaming data. For instance, a Mercer kernelbased Isomap technique has been proposed [25]. Similarly [26] presented an emulator pipeline using Isomap to determine a lowdimensional representation, whose output is fed to a GPR model. The intuition to use GPR for detecting conceptdrift is novel even though the Bayesian nonparametric approach [27]
, primarily intended for anomaly detection, comes close to our work in a single manifold setting. Their choice of the Euclidean distance (in original
space) based kernel for its covariance matrix, can result in high Procrustes error, as shown in Fig. 3. Additionally, their approach does not scale, given it does not use any approximation to be able to process the new streaming points “cheaply”.7 Conclusions
We have proposed a streaming Isomap algorithm (GPIsomap) that can be used to learn nonlinear lowdimensional representation of highdimensional data arriving in a streaming fashion. We prove that using a GPR formulation to map incoming data instances onto an existing manifold is equivalent to using existing geometric strategies [5, 6]
. Moreover, by utilizing a small batch for exact learning of the Isomap as well as training the GPR model, the method scales linearly with the size of the stream, thereby ensuring its applicability for practical problems. Using the Bayesian inference of the GPR model allows us to estimate the variance associated with the mapping of the streaming instances. The variance is shown to be a strong indicator of changes in the underlying stream properties on a variety of data sets. By utilizing the variance, one can devise retraining strategies that can include expanding the batch data set. While we have focused on Isomap algorithm in this paper, similar formulations can be applied for other NLSDR methods such as LLE
[2], etc., and will be explored as future research.References
 [1] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
 [2] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
 [3] Vin D Silva and Joshua B Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In Advances in NIPS, pages 721–728, 2003.
 [4] Yiming Wu and Kap Luk Chan. An extended isomap algorithm for learning multiclass manifold. In Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on, volume 6, pages 3429–3433. IEEE, 2004.
 [5] Frank Schoeneman, Suchismit Mahapatra, Varun Chandola, Nils Napp, and Jaroslaw Zola. Error metrics for learning reliable manifolds from streaming data. In Proceedings of 2017 SDM, pages 750–758. SIAM, 2017.
 [6] Suchismit Mahapatra and Varun Chandola. Sisomap++: Multi manifold learning from streaming data. In 2017 IEEE International Conference on Big Data. IEEE, 2017.
 [7] Christopher KI Williams and Matthias Seeger. Using the nyström method to speed up kernel machines. In Advances in NIPS, pages 682–688, 2001.

[8]
Yoshua Bengio, Jeanfrançcois Paiement, Pascal Vincent, Olivier
Delalleau, Nicolas L Roux, and Marie Ouimet.
Outofsample extensions for lle, isomap, mds, eigenmaps, and spectral clustering.
In Advances in NIPS, pages 177–184, 2004.  [9] Martin HC Law and Anil K Jain. Incremental nonlinear dimensionality reduction by manifold learning. IEEE transactions on PAMI, 28(3):377–391, 2006.
 [10] Kilian Q Weinberger, Benjamin Packer, and Lawrence K Saul. Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In AISTATS, 2005.
 [11] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in NIPS, 2002, pages 585–591, 2002.
 [12] Ian L Dryden and Kanti V Mardia. Statistical shape analysis, volume 4. Wiley, 1998.

[13]
John ShaweTaylor and Christopher Williams.
The stability of kernel principal components analysis and its relation to the process eigenspectrum.
Advances in NIPS, pages 383–390, 2003.  [14] Jihun Ham, Daniel D Lee, Sebastian Mika, and Bernhard Schölkopf. A kernel view of the dimensionality reduction of manifolds. In Proceedings of the twentyfirst ICML. ACM, 2004.
 [15] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13–30, 1963.
 [16] Marzia Polito and Pietro Perona. Grouping and dimensionality reduction by locally linear embedding. In Advances in NIPS, pages 1255–1262, 2002.

[17]
Mingyu Fan, Hong Qiao, Bo Zhang, and Xiaoqin Zhang.
Isometric multimanifold learning for feature extraction.
In IEEE 12th ICDM, 2012, pages 241–250. IEEE, 2012. 
[18]
Alexander Vergara, Shankar Vembu, Tuba Ayhan, Margaret A Ryan, Margie L Homer,
and Ramón Huerta.
Chemical gas sensor drift compensation using classifier ensembles.
Sensors and Actuators B: Chemical, 166:320–329, 2012.  [19] Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37–57, 1985.
 [20] Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. On random sampling over joins. In ACM SIGMOD Record, volume 28, pages 263–274. ACM, 1999.
 [21] Pedro Domingos and Geoff Hulten. Mining highspeed data streams. In Proceedings of the sixth ACM SIGKDD international conference on KDD, pages 71–80. ACM, 2000.

[22]
Noga Alon, Yossi Matias, and Mario Szegedy.
The space complexity of approximating the frequency moments.
Journal of Computer sciences, 58(1):137–147, 1999.  [23] Hosagrahar Visvesvaraya Jagadish, Nick Koudas, S Muthukrishnan, Viswanath Poosala, Kenneth C Sevcik, and Torsten Suel. Optimal histograms with quality guarantees. In VLDB, volume 98, 1998.
 [24] Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794–1813, 2002.
 [25] Heeyoul Choi and Seungjin Choi. Kernel isomap. Electronics letters, 40(25):1612–1613, 2004.

[26]
Wei Xing, Akeel A Shah, and Prasanth B Nair.
Reduced dimensional gaussian process emulators of parametrized partial differential equations based on isomap.
In Proceedings of the Royal Society of London A: Sciences, volume 471. The Royal Society, 2015.  [27] Oren Barkan, Jonathan Weill, and Amir Averbuch. Gaussian process regression for outofsample extension. In Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on, pages 1–6. IEEE, 2016.
 [28] William H Press, Saul A Teukolsky, William T Vetterling, and Brian P Flannery. Numerical recipes in C, volume 2. Cambridge university press Cambridge, 1996.
 [29] Anna V Little, Jason Lee, YoonMo Jung, and Mauro Maggioni. Estimation of intrinsic dimensionality of samples from noisy lowdimensional manifolds in high dimensions with multiscale svd. In IEEE/SP 15th Workshop on SSP’09, pages 85–88. IEEE, 2009.
Appendix 0.A Appendix
Lemma 1.
The matrix exponential for for rank and symmetric is given by
where is the first eigenvector of M such that and is the corresponding eigenvalue.
Proof.
Given is symmetric and rank one, can be written as .
(19) 
∎
Lemma 2.
The matrix exponential for for rank and symmetric is given by
where are the largest eigenvalues of and are the corresponding eigenvectors such that .
Proof.
Let be an real matrix. The exponential is given by
where is the identity. Real, symmetric has real eigenvalues and mutually orthogonal eigenvectors i.e. . Given has rank , we have .
(20) 
∎
Lemma 3.
The inverse of the Gaussian kernel for rank and symmetric is given by
where is the first eigenvector of M i.e. , is the corresponding eigenvalue and and .
Proof.
Lemma 4.
The inverse of the Gaussian kernel for rank and symmetric is given by
where are the largest eigenvalues of and are the corresponding eigenvectors such that .
Proof.
Using the result of previous lemma iteratively, we get the required result
(23) 
where and . ∎
Lemma 5.
The solution for Gaussian Process regression system, for the scenario when rank and for symmetric is given by
Proof.
Lemma 6.
The solution for Gaussian Process regression system, for the scenario when rank and for symmetric is given by
Proof.
Assuming the intrinsic dimensionality of the lowdimensional manifold to be implies that the inverse of the Gaussian kernel is as defined as in (23). is in this case (refer Sect. 2.1), where .
Each of the dimensions of can be processed independently, similar to the previous lemma. For the ^{th} dimension, we have,
(25) 
Thus we get the result,
(26) 
∎
Comments
There are no comments yet.