Learning Manifolds from Non-stationary Streaming Data

04/24/2018 ∙ by Suchismit Mahapatra, et al. ∙ University at Buffalo 0

Streaming adaptations of manifold learning based dimensionality reduction methods, such as Isomap, typically assume that the underlying data distribution is stationary. Such methods are not equipped to detect or handle sudden changes or gradual drifts in the distribution generating the stream. We prove that a Gaussian Process Regression (GPR) model that uses a manifold-specific kernel function and is trained on an initial batch of sufficient size, can closely approximate the state-of-art streaming Isomap algorithm. The predictive variance obtained from the GPR prediction is then shown to be an effective detector of changes in the underlying data distribution. Results on several synthetic and real data sets show that the resulting algorithm can effectively learns lower dimensional representation of high dimensional data in a streaming setting, while identify shifts in the generative distribution.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

High-dimensional data is inherently difficult to explore and analyze, owing to the “curse of dimensionality” that render many statistical and machine learning techniques inadequate. In this context,

non-linear spectral dimensionality reduction (NLSDR) has proved to be an indispensable tool. Manifold learning based NLSDR methods, such as Isomap[1], Local Linear Embedding (LLE)[2], etc., assume that the distribution of the data in the high-dimensional observed space is not uniform and in reality, the data lies near a non-linear low-dimensional manifold embedded in the high-dimensional space.

If directly applied on streaming data, NLSDR methods have to recompute the entire manifold each time a new point is extracted from a stream. This quickly becomes computationally prohibitive but guarantees the best possible quality of the learned manifold given the data. To alleviate the computational problem, landmark [3] or general out-of-sample extension methods [4] have been proposed. These techniques are still computationally expensive for practical applications. Recent streaming adaptations of NLSDR methods have relied on exact learning from a smaller batch of observations followed by approximate mapping of subsequent stream of observations [5]. Extensions to cases when the observations are sampled from multiple and possibly intersecting manifolds have been proposed as well [6].

However, existing streaming manifold learning methods [5, 6] assume that the underlying generative distribution is stationary over the stream, and are unable to detect when the distribution “drifts” or abruptly “shifts” away from the base, resulting in incorrect low-dimensional mappings (See Fig. 1). We develop a methodology to identify such changes (drifts and shifts) in the stream properties and inform the streaming algorithm to update the base model.

Figure 1: Impact of changes in the data distribution on streaming NLSDR. In the top panel, the true data lies on a 2D manifold (top-left) and the observed data is in obtained by using the swiss-roll transformation of the 2D data (top-middle). The streaming algorithm [5]

uses a batch of samples from a 2D Gaussian (black), and maps streaming points sampled from a uniform distribution (gray). The streaming algorithm performs well on mapping the batch points to

but fails on the streaming points that “drift” away from the batch (top-right). In the bottom panel, the streaming algorithm [6] uses a batch of samples from three 2D Gaussians (black). The stream points are sampled from the three Gaussians and a new Gaussian (gray). The streaming algorithm performs well on mapping the batch points to but fails on the streaming points that are “shifted” from the batch (bottom-right).

We employ a Gaussian Process (GP) [7] based adaptation of Isomap [1], a widely used NLSDR method, to process high throughput streams. The use of GP is enabled by a novel kernel that measures the relationship between a pair of observations along the manifold, and not in the original high-dimensional space. We prove that the low-dimensional representations inferred using the GP based method, GP-Isomap, are equivalent to the representations obtained using the state-of-art streaming Isomap methods [5, 6]. Additionally, we empirically show, on synthetic and real datasets, that the predictive variance associated with the GP predictions is an effective indicator of the changes (either gradual drifts or sudden shifts) in the underlying generative distribution, and can be employed to inform the algorithm to “re-learn” the core manifold.

2 Problem Statement and Preliminaries

We first formulate the NLSDR problem and provide background on Isomap and discuss its out-of-sample and streaming extensions [8, 5, 6, 9]. Additionally, we provide brief introduction to Gaussian Process (GP) analysis.

2.1 Non-linear Spectral Dimensionality Reduction

Given high-dimensional data , where , the NLSDR problem is concerned with finding its corresponding low-dimensional representation , such that , where .

NLSDR methods assume that the data lies along a low-dimensional manifold embedded in a high-dimensional space, and exploit the global (Isomap [1], Minimum Volume Embedding [10]) or local (LLE [2], Laplacian Eigenmaps [11]) properties of the manifold to map each to its corresponding .

The Isomap algorithm [1] maps each to its low-dimensional representation in such a way that the geodesic distance along the manifold between any two points, and , is as close to the Euclidean distance between and as possible. The geodesic distance is approximated by computing the shortest path between the two points using the -nearest neighbor graph and is stored in the geodesic distance matrix , where is the geodesic distance between the points and . contains squared geodesic distance values. The Isomap algorithm recovers by using the classical Multi Dimensional Scaling (MDS) on . Let be the inner product matrix between different . can be retrieved as by assuming , where and , where is the Kronecker delta. Isomap uncovers such that is as close to as possible. This is achieved by where are the

largest eigenvalues of

and

are the corresponding eigenvectors.

To measure error between the true, underlying low-dimensional representation to that uncovered by NLSDR methods, Procrustes analysis [12] is typically used. Procrustes analysis involves aligning two matrices, and , by finding the optimal translation , rotation , and scaling that minimizes the Frobenius norm between the two aligned matrices, i.e.,:

The above optimization problem has a closed form solution obtained by performing Singular Value Decomposition (SVD) of

 [12]. Consequently, one of the properties of Procrustes analysis is that when i.e. when one of the matrices is a scaled, translated and/or rotated version of the other, which we leverage upon in this work.

2.2 Streaming Isomap

Given that the Isomap algorithm has a complexity of where is the size of the data, recomputing the manifold is computationally too expensive or impractical to use in a streaming setting. Incremental techniques have been proposed in the past [9, 5], which can efficiently process the new streaming points, without affecting the quality of the embedding significantly.

The S-Isomap algorithm relies on the observation that a stable manifold can be learnt using only a fraction of the stream (denoted as the batch dataset ), and the remaining part of stream (denoted as the stream dataset ) can be mapped to the manifold in a significantly less costly manner. This can be justified by considering the convergence of eigenvectors and eigenvalues of , as the number of points in the batch increase [13]. In particular, the bounds on the convergence error for a similar NLSDR method, i.e., kernel PCA, is shown to be inversely proportional to the batch size [13]. Similar arguments can be made for Isomap, by considering the equivalence between Isomap and Kernel PCA [14, 8]. This relationship has also been empirically shown for multiple data sets [5].

The S-Isomap algorithm computes the low-dimensional representation for each new point i.e. by solving a least-squares problem formulated by matching the dot product of the new point with the low-dimensional embedding of the points in the batch dataset

, computed using Isomap, to the normalized squared geodesic distances vector

. The least-squares problem has the following form:

where111Note that the Incremental Isomap algorithm [9] has a slightly different formulation where

(1)
The S-Isomap algorithm assumes that the data stream draws from an uniformly sampled, unimodal distribution and that the stream and the batch datasets get generated from . Additionally it assumes that the manifold has stabilized i.e. is large enough. Using these assumptions in (1) above, we have that i.e. the expectation of squared geodesic distances for points in the batch dataset is close to those for points in the stream dataset . The line of reasoning for this follows from [15]. Thus (1) simplifies to (2).

(2)

2.3 Handling Multiple Manifolds

In the ideal case, when manifolds are densely sampled and sufficiently separated, clustering can be performed before applying NLSDR techniques [16, 17], by choosing an appropriate local neighborhood size so as not to include points from other manifolds and still be able to capture the local geometry of the manifold. However, if the manifolds are close or intersecting, such methods typically fail.

The S-Isomap++ [6] algorithm overcomes limitations of the S-Isomap algorithm and extends it to be able to deal with multiple manifolds. It uses the notion of Multi-scale SVD [29] to define tangent manifold planes at each data point, computed at the appropriate scale, and compute similarity in a local neighborhood. Additionally, it includes a novel manifold tangent clustering algorithm to be able to deal with the above issue of clustering manifolds which are close and in certain scenarios, intersecting, using these tangent manifold planes. After initially clustering the high-dimensional batch dataset, the algorithm applies NLSDR on each manifold individually and eventually “stitches” them together in a global ambient space by defining transformations which can map points from the individual low-dimensional manifolds to the global space. However, S-Isomap++ can only detect manifolds which it encounters in its batch learning phase and not those which it might encounter in the streaming phase.

2.4 Gaussian Process Regression

Let us assume that we are learning a probabilistic regression model to obtain the prediction at a given test input, , using a non-linear and latent function, . Assuming222For vector-valued outputs, i.e., , one can consider independent models. , the observed output, , is related to the input as:

(3)

Given a training set of inputs, and corresponding outputs, , the Gaussian Process Regression (GPR) model assumes a GP prior on the latent function values, i.e., , where is the mean of and is the covariance between any two evaluations of , i.e, and . Here we use a zero-mean function (), though other functions could be used as well. The GP prior states that any finite collection of the latent function evaluations are jointly Gaussian, i.e.,

(4)

where the entry of the covariance matrix, , is given by . The GPR model uses (3) and (4) to obtain the predictive distribution at a new test input,

, as a Gaussian distribution with following mean and variance:

(5)
(6)

where is a vector with value as .

The kernel function, , specifies the covariance between function values, and , as a function of the corresponding inputs, and . A popular choice is the squared exponential kernel, which has been used in this work:

(7)

where is the signal variance and is the length scale. The quantities , , and (from (3

)) are the hyper-parameters of the model and can be estimated by maximizing the marginal log-likelihood of the observed data (

and ) under the GP prior assumption.

One can observe that predictive mean, in (5) can be written as an inner product, i.e.,:

(8)

where . We will utilize this form in subsequent proofs.

3 Methodology

The proposed GP-Isomap algorithm follows a two-phase strategy (similar to the S-Isomap and S-Isomap++), where exact manifolds are learnt from an initial batch , and subsequently a computationally inexpensive mapping procedure processes the remainder of the stream. To handle multiple manifolds, the batch data is first clustered via manifold tangent clustering or other standard techniques. Exact Isomap is applied on each cluster. The resulting low-dimensional data for the clusters is then “stitched” together to obtain the low-dimensional representation of the input data. The difference from the past methods is the mapping procedure which uses GPR to obtain the predictions for the low-dimensional mapping (See (5)). At the same time, the associated predictive variance (See (6)) is used to detect changes in the underlying distribution.

The overall GP-Isomap algorithm is outlined in Algorithm 1 and takes a batch data set, and the streaming data, as inputs, along with other parameters. The processing is split into two phases: a batch learning phase (Lines 1–15) and a streaming phase (Lines 16–32), which are described later in this section.

1:Batch dataset: , Streaming dataset: ; Parameters: , , , , ,
2:: low-dimensional representation for
3: Batch Phase
4: Find_Clusters(, )
5:
6:for  do
7:      Isomap()
8:end for
9:for  do
10:      Estimate()
11:end for
12: FN(
13: MDS()
14:for  do
15:     
16:      [ ]
17:     
18:end for
19: Streaming Phase
20:
21:for  do
22:     if   then
23:          Re-run Batch Phase with
24:     end if
25:     for  do
26:          )
27:     end for
28:     j
29:     if   then
30:         
31:         
32:     else
33:         
34:     end if
35:end for
36:return
Algorithm 1 GP-Isomap

3.1 Kernel Function

The proposed GP-Isomap algorithm uses a novel geodesic distance based kernel function defined as:

(9)

where is the entry of the normalized geodesic distance matrix, , as discussed in Sec. 2.1, is the signal variance (whose value we fix as 1 in this work) and is the length scale hyper-parameter. Thus the kernel matrix can be written as:

(10)

This kernel function plays a key role in using the GPR model for mapping streaming points on the learnt manifold, by measuring similarity along the low-dimensional manifold, instead of the original space (), as is typically done in GPR based solutions.

The matrix, , is positive semi-definite333Actually is not always guaranteed to be PSD. [25] use a additive constant to make it PSD. Equation (11) via exponentiation introduces the identity which functions similarly to an additive constant. (PSD), due to the double mean centering done to squared geodesic distance matrix . Consequently, we note that the kernel matrix, , is positive definite (refer (11) below).

Using lemmas 1 and 2, the novel kernel we propose can be written as

(11)

where and are eigenvalue/eigenvector pairs of as discussed in Sec. 2.1.

3.2 Batch Learning

The batch learning phase consists of these tasks :

3.2.1 Clustering.

The first step in the batch phase involves clustering of the batch dataset into individual clusters which represent the manifolds. In case, contains a single cluster, the algorithm can correctly detect it. (Line 1)

3.2.2 Dimension Reduction.

Subsequently, full Isomap is executed on each of the individual clusters to get low-dimensional representations of the data points belonging to each individual cluster. (Lines 3–5)

3.2.3 Hyper-parameter Estimation.

The geodesic distance matrix for the points in the th manifold and the corresponding low-dimensional representation , are fed to the GP model for each of the manifolds, to perform hyper-parameter estimation, which outputs . (Lines 6–8)

3.2.4 Learning Mapping to Global Space.

The low-dimensional embedding uncovered for each of the manifolds can be of different dimensionalities. Consequently, a mapping to a unified global space is needed. To learn this mapping, a support set is formulated, which contains the pairs of nearest points and pairs of farthest points, between each pair of manifolds. Subsequently, MDS is executed on this support set to uncover its low-dimensional representation . Individual scaling and translation factors are learnt via solving a least squares problem involving , which map points from each of the individual manifolds to the global space. (Lines 9–15)

3.3 Stream Processing

In the streaming phase, each sample in the stream set is embedded using each of the GP models to evaluate the prediction , along with the variance (Lines 22–24). The manifold with the smallest variance get chosen to embed the sample into, using the corresponding scaling and translation factor , provided is within the allowed threshold (Lines 25–28), otherwise sample is added to the unassigned set (Lines 29–31). When the size of unassigned set exceeds certain threshold , we add them to the batch dataset and re-learn the base manifold (Line 18–20). The assimilation of the new points in the batch maybe done more efficiently in an incremental manner.

3.4 Complexity

The runtime complexity of our proposed algorithm is dominated by the GP regression step as well as the Isomap execution step, both of which have complexity, where is the size of the batch dataset . This is similar to the S-Isomap and S-Isomap++ algorithms, that also have a runtime complexity of . The stream processing step is for each incoming streaming point. The space complexity of GP-Isomap is dominated by . This is because each of the samples of the stream set get processed separately. Thus, the space requirement as well as runtime complexity does not grow with the size of the stream, which makes the algorithm appealing for handling high-volume streams.

4 Theoretical Analysis

In this section, we first state the main result and subsequently prove it using results from lemmas stated later in Appendix 0.A.

Theorem 4.1.

The prediction GP of our proposed approach, GP-Isomap is equivalent to the prediction ISO of S-Isomap i.e. the Procrustes Error ProcGP, ISO between GP and ISO is .

Proof.

The prediction of GP-Isomap is given by (8). Using Lemma 6, we demonstrated that

(12)

The term for GP-Isomap, using our novel kernel function evaluates to

(13)

where represents the vector containing the squared geodesic distances of to containing .

Considering the above equation element-wise, we have that the th term of equates to . Using Taylor’s series expansion we have,

(14)

The prediction by the S-Isomap is given by (2) as follows :-

(15)

where is as defined by (2).

Rewriting (2) we have,

(16)

where is a constant with respect to , since it depends only on squared geodesic distance values associated within the batch dataset and is part of the stream dataset .

We now consider the st dimension of the predictions for GP-Isomap and S-Isomap only and demonstrate their equivalence via Procrustes Error. The analysis for the remaining dimensions follows a similar line of reasoning.

Thus for the st dimension, using (16) the S-Isomap prediction is

(17)

Similarly using Lemma 6, (13) and (14), we have that the st dimension for GP-Isomap prediction is given by,

(18)

We can observe that is a scaled and translated version of . Similarly for each of the dimensions (), the prediction for the GP-Isomap can be shown to be a scaled and translated version of the prediction for the S-Isomap . These individual scaling and translation factors can be represented together by single collective scaling and translation factors. Consequently, the Procrustes Error ProcGP, SI is 0. (refer Sect. 2.1). ∎

5 Results and Analysis

In this section444All synthetic datasets (refer Fig.1), figures and code are available here. we demonstrate the ability of the predictive variance within GP-Isomap to identify changes in the underlying distribution in the data stream on synthetically generated datasets as well as on benchmark sensor data sets.

5.1 Results on Synthetic Data Sets

Swiss roll datasets are typically used for evaluating manifold learning algorithms. To evaluate our method on sudden concept-drift, we use the Euler Isometric Swiss Roll dataset [5] consisting of four Gaussian patches having points from each patch, chosen at random, which are embedded into using a non-linear function . The points for each of the Gaussian modes are divided equally into training and test sets. To test incremental concept-drift, we use the single patch dataset, which consists of a single patch borrowed from the above, which we use as the training data, along with a uniform distribution of points for testing. Figures 1, 2 and 3 demonstrates our results on these datasets.

Figure 2: Using variance to detect concept-drift using the four patches dataset. The horizontal axis represents time and the vertical axis represents variance of the stream. Initially, when stream consists of samples generated from known modes, variance is low, later when samples from an unrecognized mode appear, variance shoots up. We can also observe the three variance “bands” above corresponding to the variance levels of the three modes for .

5.1.1 Gaussian patches on Isometric Swiss Roll

To evaluate our method on sudden concept-drift, we trained our GP-Isomap model using 3 out of 4 training sets of the 4 patches dataset. Subsequently we stream points randomly from the test sets of only these 3 classes initially and later stream points from the test set of the fourth class, keeping track of the predictive variance all the while. Fig. 2 demonstrates the sudden increase (see red line) in the variance of the stream when streaming points are from the fourth class i.e. unknown mode. Thus GP-Isomap is able to detect concept-drift correctly. The bottom panel of Fig. 1 demonstrates the performance of S-Isomap++ on this dataset. It fails to map the streaming points of the unknown mode correctly, given it had not encountered the unknown mode in its batch training phase.

To test our proposed approach for detecting incremental concept drift, we train our model using the single patch dataset and subsequently observe how the variance of the stream behaves on the test streaming dataset. The top panel of Fig. 1 shows how gradually variance increases smoothly as the stream gradually drifts away from the Gaussian patch. This shows that GP-Isomap maps incremental drift correctly. In Sect. 4, we proved the equivalence between the prediction of S-Isomap with that of GP-Isomap, using our novel kernel. In Fig. 3, we show empirically via Procrustes Error (PE) that indeed the prediction of S-Isomap matches that of GP-Isomap, irrespective of size of batch used. PE for GP-Isomap with the Euclidean distance based kernel remains high irrespective of the size of the batch, which clearly demonstrates the unsuitability of this kernel to adequately learn mappings in the low-dimensional space.

Figure 3: Procrustes error (PE) between the ground truth with a) GP-Isomap (blue line) with the geodesic distance based kernel, b) S-Isomap (dashed blue line with dots) and c) GP-Isomap (green line) using the Euclidean distance based kernel, for different fractions () of data used in the batch . The behavior of PE for a) closely matches that for b). However, the PE for GP-Isomap using the Euclidean distance kernel remains high irrespective of demonstrating its unsuitability for manifolds.

5.2 Results on Sensor Data Set

The Gas Sensor Array Drift (GSAD) [18] is a benchmark dataset () available to research communities to develop strategies to dealing with concept drift and uses measurements from 16 chemical sensors used to discriminate between 6 gases (class labels) at various concentrations. We demonstrate the performance of our proposed method on this dataset.

The data was first mean normalized. Data points from the first five classes were divided into training and test sets. We train our model using the training data from four out of these five classes. While testing, we stream points randomly from the test sets of these four classes first and later stream points from the test set of the fifth class. Figure 4 demonstrates our results. Our model can clearly detect concept-drift due to the unknown fifth class by tracking the variance of the stream, using the running average (red line).

Figure 4: Using variance to identify concept-drift for the GSAD dataset. Similar to Fig. 2, the introduction of points from an unknown mode in the stream results in variance increasing drastically as demonstrated by the mean (red line). The spread of variances for points from known modes () is also smaller, compared to the spread for the points from the unknown mode ().

6 Related Works

Processing data streams efficiently using standard approaches is challenging in general, given streams require real-time processing and cannot be stored permanently. Any form of analysis, including detecting concept-drift requires adequate summarization which can deal with the inherent constraints and that can approximate the characteristics of the stream well. Sampling based strategies include random sampling [19, 20]

as well as decision-tree based approaches

[21] which have been used in this context. To identify concept-drift, maintaining statistical summaries on a streaming “window” is a typical strategy [22, 23, 24]. However, none of these are applicable in the setting of learning a latent representation from the data, e.g., manifolds, in the presence of changes in the stream distribution.

We discuss limitations of existing incremental and streaming solutions that have been specifically developed in the context of manifold learning, specifically in the context of the Isomap algorithm in Section 2. Coupling Isomap with GP Regression (GPR) has been explored in the past [25, 26], though not in the context of streaming data. For instance, a Mercer kernel-based Isomap technique has been proposed [25]. Similarly [26] presented an emulator pipeline using Isomap to determine a low-dimensional representation, whose output is fed to a GPR model. The intuition to use GPR for detecting concept-drift is novel even though the Bayesian non-parametric approach [27]

, primarily intended for anomaly detection, comes close to our work in a single manifold setting. Their choice of the Euclidean distance (in original

space) based kernel for its covariance matrix, can result in high Procrustes error, as shown in Fig. 3. Additionally, their approach does not scale, given it does not use any approximation to be able to process the new streaming points “cheaply”.

7 Conclusions

We have proposed a streaming Isomap algorithm (GP-Isomap) that can be used to learn non-linear low-dimensional representation of high-dimensional data arriving in a streaming fashion. We prove that using a GPR formulation to map incoming data instances onto an existing manifold is equivalent to using existing geometric strategies [5, 6]

. Moreover, by utilizing a small batch for exact learning of the Isomap as well as training the GPR model, the method scales linearly with the size of the stream, thereby ensuring its applicability for practical problems. Using the Bayesian inference of the GPR model allows us to estimate the variance associated with the mapping of the streaming instances. The variance is shown to be a strong indicator of changes in the underlying stream properties on a variety of data sets. By utilizing the variance, one can devise re-training strategies that can include expanding the batch data set. While we have focused on Isomap algorithm in this paper, similar formulations can be applied for other NLSDR methods such as LLE 

[2], etc., and will be explored as future research.

References

  • [1] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
  • [2] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
  • [3] Vin D Silva and Joshua B Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In Advances in NIPS, pages 721–728, 2003.
  • [4] Yiming Wu and Kap Luk Chan. An extended isomap algorithm for learning multi-class manifold. In Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on, volume 6, pages 3429–3433. IEEE, 2004.
  • [5] Frank Schoeneman, Suchismit Mahapatra, Varun Chandola, Nils Napp, and Jaroslaw Zola. Error metrics for learning reliable manifolds from streaming data. In Proceedings of 2017 SDM, pages 750–758. SIAM, 2017.
  • [6] Suchismit Mahapatra and Varun Chandola. S-isomap++: Multi manifold learning from streaming data. In 2017 IEEE International Conference on Big Data. IEEE, 2017.
  • [7] Christopher KI Williams and Matthias Seeger. Using the nyström method to speed up kernel machines. In Advances in NIPS, pages 682–688, 2001.
  • [8] Yoshua Bengio, Jean-françcois Paiement, Pascal Vincent, Olivier Delalleau, Nicolas L Roux, and Marie Ouimet.

    Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering.

    In Advances in NIPS, pages 177–184, 2004.
  • [9] Martin HC Law and Anil K Jain. Incremental nonlinear dimensionality reduction by manifold learning. IEEE transactions on PAMI, 28(3):377–391, 2006.
  • [10] Kilian Q Weinberger, Benjamin Packer, and Lawrence K Saul. Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In AISTATS, 2005.
  • [11] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in NIPS, 2002, pages 585–591, 2002.
  • [12] Ian L Dryden and Kanti V Mardia. Statistical shape analysis, volume 4. Wiley, 1998.
  • [13] John Shawe-Taylor and Christopher Williams.

    The stability of kernel principal components analysis and its relation to the process eigenspectrum.

    Advances in NIPS, pages 383–390, 2003.
  • [14] Jihun Ham, Daniel D Lee, Sebastian Mika, and Bernhard Schölkopf. A kernel view of the dimensionality reduction of manifolds. In Proceedings of the twenty-first ICML. ACM, 2004.
  • [15] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13–30, 1963.
  • [16] Marzia Polito and Pietro Perona. Grouping and dimensionality reduction by locally linear embedding. In Advances in NIPS, pages 1255–1262, 2002.
  • [17] Mingyu Fan, Hong Qiao, Bo Zhang, and Xiaoqin Zhang.

    Isometric multi-manifold learning for feature extraction.

    In IEEE 12th ICDM, 2012, pages 241–250. IEEE, 2012.
  • [18] Alexander Vergara, Shankar Vembu, Tuba Ayhan, Margaret A Ryan, Margie L Homer, and Ramón Huerta.

    Chemical gas sensor drift compensation using classifier ensembles.

    Sensors and Actuators B: Chemical, 166:320–329, 2012.
  • [19] Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37–57, 1985.
  • [20] Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. On random sampling over joins. In ACM SIGMOD Record, volume 28, pages 263–274. ACM, 1999.
  • [21] Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on KDD, pages 71–80. ACM, 2000.
  • [22] Noga Alon, Yossi Matias, and Mario Szegedy.

    The space complexity of approximating the frequency moments.

    Journal of Computer sciences, 58(1):137–147, 1999.
  • [23] Hosagrahar Visvesvaraya Jagadish, Nick Koudas, S Muthukrishnan, Viswanath Poosala, Kenneth C Sevcik, and Torsten Suel. Optimal histograms with quality guarantees. In VLDB, volume 98, 1998.
  • [24] Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794–1813, 2002.
  • [25] Heeyoul Choi and Seungjin Choi. Kernel isomap. Electronics letters, 40(25):1612–1613, 2004.
  • [26] Wei Xing, Akeel A Shah, and Prasanth B Nair.

    Reduced dimensional gaussian process emulators of parametrized partial differential equations based on isomap.

    In Proceedings of the Royal Society of London A: Sciences, volume 471. The Royal Society, 2015.
  • [27] Oren Barkan, Jonathan Weill, and Amir Averbuch. Gaussian process regression for out-of-sample extension. In Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on, pages 1–6. IEEE, 2016.
  • [28] William H Press, Saul A Teukolsky, William T Vetterling, and Brian P Flannery. Numerical recipes in C, volume 2. Cambridge university press Cambridge, 1996.
  • [29] Anna V Little, Jason Lee, Yoon-Mo Jung, and Mauro Maggioni. Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale svd. In IEEE/SP 15th Workshop on SSP’09, pages 85–88. IEEE, 2009.

Appendix 0.A Appendix

Lemma 1.

The matrix exponential for for rank and symmetric is given by

where is the first eigenvector of M such that and is the corresponding eigenvalue.

Proof.

Given is symmetric and rank one, can be written as .

(19)

Lemma 2.

The matrix exponential for for rank and symmetric is given by

where are the largest eigenvalues of and are the corresponding eigenvectors such that .

Proof.

Let be an real matrix. The exponential is given by

where is the identity. Real, symmetric has real eigenvalues and mutually orthogonal eigenvectors i.e. . Given has rank , we have .

(20)

Lemma 3.

The inverse of the Gaussian kernel for rank and symmetric is given by

where is the first eigenvector of M i.e. , is the corresponding eigenvalue and and .

Proof.

Using (11) for , we have

(21)

Representing as and as and using as , as and as in the Sherman-Morrison identity [28], we have

(22)

Lemma 4.

The inverse of the Gaussian kernel for rank and symmetric is given by

where are the largest eigenvalues of and are the corresponding eigenvectors such that .

Proof.

Using the result of previous lemma iteratively, we get the required result

(23)

where and . ∎

Lemma 5.

The solution for Gaussian Process regression system, for the scenario when rank and for symmetric is given by

Proof.

Assuming the intrinsic dimensionality of the low-dimensional manifold to be implies that the inverse of the Gaussian kernel is as defined as in (22). is in this case (refer Sect. 2.1). Thus we have

(24)

Lemma 6.

The solution for Gaussian Process regression system, for the scenario when rank and for symmetric is given by

Proof.

Assuming the intrinsic dimensionality of the low-dimensional manifold to be implies that the inverse of the Gaussian kernel is as defined as in (23). is in this case (refer Sect. 2.1), where .

Each of the dimensions of can be processed independently, similar to the previous lemma. For the th dimension, we have,

(25)

Thus we get the result,

(26)