 # Manifold Fitting in Ambient Space

Modern data sets in many applications no longer comprise samples of real vectors in a real vector space but samples of much more complex structures which may be represented as points in a space with certain underlying geometric structure, namely a manifold. Manifold learning is an emerging field for learning the underlying structure. The study of manifold learning can be split into two main branches, namely dimension reduction and manifold fitting. With the aim of interacting statistics and geometry, we tackle the problem of manifold fitting in the ambient space. Inspired by the relation between the eigenvalues of the Laplace-Beltrami operator and the geometry of a manifold, we aim to find a small set of points that preserve the geometry of the underlying manifold. Based on this relationship, we extend the idea of subsampling to noisy datasets in high dimensional space and utilize the Moving Least Squares (MLS) approach to approximate the underlying manifold. We analyze the two core steps in our proposed method theoretically and also provide the bounds for the MLS approach. Our simulation results and real data analysis demonstrate the superiority of our method in estimating the underlying manifold from noisy data.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The digital age in which we live in has resulted in the huge amounts of high dimensional data. Analysis of such data is extremely difficult and computational costly due to the problem of curse of dimensionality. Traditional statistics deals with observations which are essentially elements of a real vector space. Nowadays many problems of statistical interest in the sciences require the analysis of data which consist of more complex objects, taking values in spaces which are naturally not (Euclidean) vector spaces but still feature some geometric structure. For instance, in object recognition, an image with resolution of

is directly represented as a vector in a -dimensional vector space. For a vector space of such high dimensions, the data that we have is obviously too sparse for efficient processing and analysis. Explicitly, one million data points in a 100-dimensional vector space is considered to be too small a data set for data analysis. However, with the assumption that the data points are close to a much lower dimensional manifold, e.g. a 6-dimensional manifold, then, in theory, one million data points is enough to extract the relevant information required.

Manifold learning aims to handle the curse of dimensionality by assuming that the data points are situated near or on a manifold with dimension much lower than the dimension of the ambient space. To be precise, the high dimensional data is assumed to be locally homeomorphic to the lower dimensional Euclidean space (the manifold), while the form of the manifold is not necessarily explicitly given. In recent years, manifold learning has been implemented in many research topics such as pattern recognition and big data analytics. Manifold learning can be split into two different areas of research. The most common area of research is called dimension reduction in which the data points are embedded into a lower dimensional Euclidean space which can then be used for other analysis such as classification. The key assumption for conducting dimension reduction lies in that the data has a local linear structure and the mapping constructed preserves some kind of distance. The other area of research involves approximating the underlying manifold directly in the ambient space from a noisy data set. The goal is to project the noisy data set onto the underlying manifold to denoise the data. This area of research is different from the former in the sense that the projected points still lie in the original ambient space.

Dimension reduction can be split into categories of linear or nonlinear methods depending on whether the underlying manifold is assumed to be linear or nonlinear. The most well-known linear dimension reduction technique is Principal Component Analysis (PCA)



. PCA assumes that the data points lie on a linear manifold and finds a projection of the original points to a linear subspace by maximizing the explained variance within the subspaces or equivalently minimizing the variance of the residuals (the projected data points to the subspace). PCA works well when the data lies on a linear manifold. However, in many real life data, the underlying manifold is nonlinear and PCA has been extended to study some specific nonlinear structure. In

, the authors extended PCA to do nonlinear statistical shape analysis. PCA has also been redeveloped to study the structure of a torus . For abstract nonlinear manifolds, many other nonlinear dimension reduction methods have been developed and the criterion that each method tries to maximize varies. Multidimensional Scaling , curvilinear component analysis  and Isomap  aims to preserve distances (either Euclidean or geodesic; local or global) between each pair of input data points. Other methods like Local Linear Embedding (LLE)  tries to maintain angles between neighbourhood points; and Maximum Variance Unfolding  uses semi-definite programming to maximize the variance of non-neighbouring points.

It is of great statistical interest to estimate the manifold in ambient space so that we can quantify the uncertainty of the learned manifold. In particular for many real life applications, it is more convenient to visualize the data in the ambient space as compared to a lower dimensional space. Note that all aforementioned algorithms only aim to find a global embedding of the data into a space that has dimension much smaller than the input space (mainly focusing on the lower dimensional geometric properties of the manifold), therefore they are not necessarily able to output a manifold in the ambient space. To the best of our knowledge, lesser attention has been aimed at denoising or estimating the underlying manifold from the input data in manifold learning. One recent exception includes the seminar work of , where the authors have developed an algorithm that outputs provably a smooth manifold from noisy data. They use an exhaustive search to define an approximate squared-distance function from the data which is used for the manifold learning.  also demonstrated two different methods of using the approximate squared-distance function to approximate the underlying manifold from noisy data. To differentiate,  eliminates the use of the exhaustive search. The study of manifold learning has also been linked to the generalized Whitney problem. In , the authors aims to estimate a Riemannian manifold to approximate a metric space . Another technique using a statistical approach relying upon graph-based diffusion process is discussed in . On the other hand, for noiseless manifold samples, there has been some work done at reconstruction of the manifold [1, 2, 8]. For specific type of data like noisy images, Locally Linear Denoising (LLD)  is a method developed to denoise the input data set by exploiting the structure of the underlying manifold. Theoretically,  has proven a rate of convergence for approximating a manifold given a noisy data set and the optimal rate of convergence only depends on the sample set and the intrinsic dimension of the manifold.

In our work, we focus on manifold fitting. Given a high dimensional data set, we assume that the data lies near (presumably with noise) a much lower dimensional smooth manifold. Furthermore, we assume that the intrinsic dimension of the manifold is known. The aim of our paper is to fit a manifold to the noisy data and map it to the underlying manifold. Our method is different from dimension reduction in the sense that the output of our algorithm are not lower dimensional data points that are projected onto from the input data. Instead, the fitted manifold and the projected data points still lie in the original ambient space. Furthermore, our algorithm is also able to handle data sets that lie in high dimensional space. This will be further established in our numerical simulations when we apply our algorithm on image data. Our algorithm consists of two core parts, a subsampling portion and a manifold fitting approach.

In the subsampling approach, we utilize the spectrum of the Laplace-Beltrami operator. The Laplace-Beltrami operator is the extension of the Laplace operator in Euclidean space to the manifold. The eigenvalues of the Laplace-Beltrami operator can be used to identify a manifold up to isometry . When a discrete data set is given, the eigenvalues of the Laplace-Beltrami operator can be utilized to select a subset which still preserves the geometry of the underlying manifold. Our idea is that: by controlling an appropriate number of sample points, which is usually very small, we aim to find a subset of points that are essential for recovering the manifold. To achieve this, the key is to make use of the Laplace-Beltrami operator to fine tune a subset of points such that the eigenvalues do not change by a large margin. To illustrate the idea, we compare two data sets of points on a unit sphere with one set containing 624 number of points and the other containing 2526 number of points. The exact eigenvalues of the Laplace-Beltrami operator for the unit sphere is with multiplicity for . Here, there are an infinite number of eigenvalues when we consider the manifold in a continuous sense. However, when we consider a set of sampled points of the manifold, we are only able to approximate a finite number of the eigenvalues equal to the number of sampled points. We approximate the eigenvalues using the algorithm in  for both sets. The difference, calculated by where and are the -th smallest eigenvalue for the set containing 624 and 2526 number of points respectively and is the exact eigenvalue, is plotted in Figure 1. Here, we can see that although the normalized difference in eigenvalues increases, it still remains pretty small even at the eigenvalue. The increase in the difference is expected as we are using fewer number of points to approximate a continuous manifold, i.e. 624 as compared to 2526 number of points. However, the good thing is that, with fewer number of points, we are still able to extract the structure of the underlying manifold and have a lower cost in computational complexity. Figure 1: (a) Sampling of a unit sphere with 642 points. (b) Sampling of a unit sphere with 2562 points. (c) Comparison of the eigenvalues of the Laplace-Beltrami operator of the two different samplings of the unit sphere.

Once the subsampled set is determined, the second part is to fit a manifold to the subsampled set. In this step, we apply a nonlinear MLS method to the subsampled set. MLS is a method to reconstruct a continuous function from point samples . MLS approximates the continuous function by a -degree polynomial for some chosen . Here, we extend the idea of MLS from function reconstruction to surface reconstruction by finding a polynomial surface that approximates the manifold locally at each point. By using the eigenvalues of the Laplace-Beltrami operator, the subsampled set still preserves the geometry of the underlying manifold as can be seen by the example of the unit sphere. Therefore, by working with the subsampled set, we can reduce the complexity of the computations and still retrieve the geometry of the underlying manifold.

The rest of the paper is organised as such. In Section 2, we give an overview of our method to give the reader a better understanding of our method. In Section 3, we discuss the first main step of our method which is the subsampling step. In this section, we also define the Laplace-Beltrami and heat operator on a manifold. Furthermore, we discuss their relationship and show how we can utilize the eigenspectrum of the Laplace-Beltrami operator to do subsampling. The second main part is manifold fitting which involves MLS will be presented in Section 4. We will define MLS and show how it can be used for surface approximation. In this section, we also present theoretical results for the accuracy of tangent space estimation with noisy data points using PCA and the manifold fitting approach. The theoretical results for the tangent space estimation are extended from  which focuses on sample points that lie directly on the manifold. We then explain our algorithm in Section 5 and provide the pseudocodes for our algorithm. In Section 6, we present three simulation results. The first two simulations serve as an illustration for our theoretical results for MLS. The last simulation works with face image data to show that our method is also able to handle noisy data in high dimensional space. Finally, in Section 7, we provide a conclusion along with possible directions for future works.

## 2 Overview of the Algorithm

In this section, we give an overview of our algorithm and briefly describe how the algorithm works. Given a set of data points that is assumed to lie close to an underlying manifold of lower dimension, we aim to project the data points onto the underlying manifold to be able to extract information of the structure of the underlying manifold. This can also be seen as denoising the input data. To denoise the input data, we use a nonlinear MLS approach which is one of the many ways to denoise the input data. Furthermore, to reduce the complexity of the computations, a subsampling step is first applied to the input data to retrieve a smaller subset of points which still preserves the structure of the underlying manifold.

### 2.1 Subsampling

The subsampling step of the algorithm is an iterative algorithm. We will use a measure function that will be defined in Section 3 to calculate the measure at each point iteratively. Once the measure at a point is computed, we will then determine if that point should be considered as a subsampled point. If the measure of the point is large, it will be considered as a subsampled point, otherwise we will not consider it. The measure function defined is related to the perturbation of the eigenvalues of the Laplace-Beltrami operator. Assume we have a set of points and a point that is not in . The measure of the point computed with the set contains some information of the structural points of and . A large measure means that will greatly affect the structure of the manifold sampled with the points in , while a small measure will indicate that means that is close to some of the points in and it will not change the structure of the manifold.

### 2.2 Manifold Fitting

In the manifold fitting step, we apply an extended MLS approach to fit a manifold to the subsampled data. The MLS approximation is a classical approach originally designed for approximating a smooth multivariate function that interpolates a set of data points. MLS approximation has been studied for manifold approximation in

. In that paper, the authors apply MLS to noisy manifold data to approximate the underlying manifold. They considered using the entire data set to approximate the structure of the underlying manifold and this is very costly in terms of computation if there is a huge amount of data that is given in high dimensions. The difference to our approach is that we considered a subsampling step to greatly reduce the complexity of the computations without compromising on the structure of the manifold. Furthermore, to construct the local coordinates required for MLS, they used an iterative weighted PCA algorithm which further increases the computational complexity. In this paper, we instead use PCA and provide an error bound of the tangent space when using PCA.

## 3 Subsampling

In this section, we formally introduce the Laplace-Beltrami operator and the heat operator and discuss their relations in terms of their spectrum. After which, we will define the measure function with relation to the Laplace-Beltrami operator.

### 3.1 Laplace-Beltrami Operator

Let be a smooth manifold with an associated metric embedded in . The Laplace-Beltrami operator defined on is the extension of the Laplace operator in . In local coordinates, we can write the Laplace-Beltrami operator as

 ΔMf=1√|det(g)|∂∂xi(√|det(g)| gij∂f∂xj).

The Laplace-Beltrami operator admits a spectral decomposition given by the eigenpair solving the equation

 ΔMϕi(x)=−λiϕi(x), (1)

where is a point on .

The eigenspectrum of the Laplace-Beltrami operator encompasses many of the important geometric and topological properties of a manifold. Note that the eigenfunctions defined by (

1) have been used as a natural basis in shape analysis, which is an analogue to the Fourier basis.

### 3.2 Heat Operator

The heat equation on is given by

 ∂h(x,t)∂t=ΔMh(x,t), (2)

where is the heat diffusion at the time at point where . The heat diffusion describes the flow of heat on the manifold . Intuitively, we can think of the heat diffusion as a temperature distribution on .

###### Definition 3.1 (Heat Operator).

The heat operator is defined as the heat distribution at time of an initial distribution , i.e.

The heat kernel is associated to the heat operator such that

 Htf(x)=∫MK(x,y,t)f(y) dy. (3)

If we consider , then we have an explicit form for the heat kernel,

 K(x,y,t)=1(4πt)m/2exp(−∥x−y∥24t).

### 3.3 Relationship between the Laplace-Beltrami and Heat Operator

The Laplace-Beltrami operator and the heat operator is related such that

 Ht=e−tΔM=∞∑k=0(−tΔM)kk!.

Consider the limit as , then

 ΔM=limt→01−Htt.

The eigenfunctions of and hence eigenfunctions of coincide with the eigenfunctions of . Let be the th eigenvalue of , then the th eigenvalue of is . When invariant metric is chosen, the Laplace-Beltrami and the corresponding heat operator admit invariant properties. Given a data set, it is not practical to calculate the Laplace-Beltrami operator on a manifold. We instead work with the hear operator. The heat kernel can be computed using eigenpairs of the Laplace-Beltrami Operator as

 K(x,y,t)=∞∑i=0e−λitϕi(x)ϕi(y). (4)

In a continuous setting, the heat equation is given by (3). Given a manifold with a finite sample point set , we can approximate the heat operator as a matrix where . Let . Then the discrete heat equation is given by

 (Htf)i=k∑j=1K(xi,xj,t)fj. (5)

Here, as , then (5) will converge to (3). In this paper, we deal with finite sample sets, thus we will be using to approximate the heat operator in our algorithm.

### 3.4 Measure of a Point

The authors of  cleverly defined and used a measure function to determine how much a new point affects the eigenvalues of the heat kernel of a manifold.

###### Definition 3.2.

(Measure Function) The measure function, is given by

 s(x)=1−hTtH−1thtK(x,x,t), (6)

where is a vector such that .

The proof can be found in Appendix B of  and we do not reproduce it here. Equation (6) shows that the measure of a point can be calculated from using the heat kernel of the manifold. From Subsection 3.3, we can see that the measure function can also be used to determine the perturbation of the eigenvalues of the Laplace-Beltrami operator. In , it has been shown that if the measure is large for a point , then will contribute a large eigenvalue to the spectrum and change the distribution of the eigenspectrum. In the algorithm, this means that for a point that has a large measure, it contains important information about the geometry of the underlying manifold, and therefore should be considered as a subsampled point.

However, for any abstract manifold with intrinsic dimension , we do not know the exact form of the heat kernel. Thus, a direct computation of the measure function for an abstract manifold is not possible since it depends on the heat kernel. Recall that the heat kernel is used to approximate the Laplace-Beltrami operator when . There exists an asymptotic expansion for and  have carefully studied the asymptotic behavior of . The reader may wish to refer to Remark 2.2 in  to understand more about the asymptotic expansion. In Remark 2.1 of , it has been shown that if and lie in a -neighbourhood of where . Then when ,

 K(x1,x2,t)=1(2πt)m/2exp(−∥E−1x(x1)−E−1x(x2)∥22t)(1+o(1)). (7)

where is the logarithm map which takes a neighbourhood of in and maps it to the tangent space . We note here that the authors considered the heat equation to be

 ∂h(x,t)∂t=12ΔMh(x,t),

while the heat equation on the manifold is given by (2) without the factor of . In this case for (2), (7) becomes

 K(x1,x2,t)=1(4πt)m/2exp(−∥E−1x(x1)−E−1x(x2)∥24t)(1+o(1)). (8)

Inspired by (8), we are defining a local measure function which will be used for our algorithm to approximate (6).

###### Definition 3.3.

(Local Measure Function) Let be a smooth manifold with intrinsic dimension , and be the set of neighbourhood points on such that they are of distance away from . Then, the local measure on is given by

 sN(x)=1−^hT^H−1^h^K(x,x,t),

where is the heat kernel given in (8) for a fixed time and and depends on the heat kernel .

For our numerical calculations, we omit and use

 ^K=1(4πt)m/2exp(−∥E−1x(x1)−E−1x(x2)∥24t) (9)

for the approximations.

###### Remark 1.

Intuitively, this definition for the heat kernel approximation makes sense. When we consider a small patch of the manifold around the point and this patch is close to the tangent space of the manifold at , then we can expect the heat kernel of the manifold to be close to the heat kernel in euclidean space. We use the simple example of the 3-dimensional hyperbolic space, with curvature, then its heat kernel is given as

 KH3k(x,y,t)=1(4πt)m/2k∥x−y∥sinh(k∥x−y∥)exp(−∥x−y∥24t−k2t).

Here, it is obvious that as the , then .

In Subsection 2.1, we gave an overview of the subsampling algorithm. The reader may want to read Subsection 5.1 to understand how the measure function is incorporated into the subsampling step. Here, we would like to discuss one important and interesting observation about the output of this subsampling approach.

###### Definition 3.4 (ϵ-separated set).

Let be a metric space. A subset is said to be a -separated set if and only if for every , we have .

###### Observation 1.

Given a dense point set , the subsampling algorithm will output a subset of points which is an -separated set and depends on the threshold of the local measure function and is the geodesic on the manifold.

In the subsampling algorithm, we pick points which have a measure larger than a certain tolerance. Intuitively, given a subset of accepted points, , we compute the measure of a new point based on as follows: if is close to some point , i.e. , we expect its measure to be small n the sense that this point would not perturb the eigenvalues by a lot. Thus, we will not add into . On the other hand, if is far away from all points , i.e. , we expect it to have a large perturbation on the eigenvalues of the Laplace-Beltrami operator and we will add into . Therefore, we will arrive at a subset which is an -separated set of the input set . An example is illustrated below, where a random sampling of the unit sphere is shown in Figure 2. The subsampled algorithm is applied and the results are shown in Figure 2. We can see that the subsampled points are spaced apart compared to the original dense sampled set. Figure 2: (a) Random sampling of a sphere. (b) Subsampled set.

## 4 Manifold Fitting

In this section, we will discuss the manifold fitting step which involves MLS. We will define the MLS for function approximation and discuss how MLS is used in a local setting for surface approximation. We will also analyze and give some theoretical bounds for the manifold fitting.

### 4.1 MLS for Function Approximation

Let be a set of distinct data points in and let be the corresponding set of sampled function values at those points for some function . We want to approximate the function with a smooth function . The MLS approach was developed for finding such a function . In this approach, where is a -degree polynomial function. The -degree polynomial approximation of at a point using MLS defined is by such that

 ^fq(x)=argmin^f∈ΠnqI∑i=1(^f(xi)−f(xi))2w(∥x−xi∥),

where is a decreasing non-negative weight function and is the space of polynomials of degree in .

### 4.2 MLS Local Projection

Let be a set of distinct data points in and assume that lies close to a manifold of dimension . Let illustrated in Figure 3 by the black points. We wish to find a smooth surface that approximates the points in and map to this surface.

We apply PCA to and use the first principal components for the local coordinates system shown in Figure 3. We map to the -dimensional space using the local coordinates. Let the mapped points be illustrated in blue in Figure 3. We then utilize MLS for function approximation to get a polynomial which approximates the function where shown in Figure 3. Figure 3: Illustration of the MLS for local projection for 1-dimensional manifold.

### 4.3 Error Bounds for Manifold Fitting

We will present the analysis for the manifold fitting step. Here, we present the analysis in two subsections, 4.3.1 and 4.3.2, which discusses the error bounds for tangent approximation using PCA and the error bounds for the MLS respectively.

#### 4.3.1 Error Bounds for Tangent Space Approximation

This subsection extends the analysis for tangent space estimation using manifold data from  to noisy data. We first setup the framework needed for the analysis and give two theorems for the tangent space approximated with PCA. Theorem 4.1 gives the analysis for the case when there are infinite number of sample points and Theorem 4.2 is for the case with a finite number of sample points.

Consider to be the set of points that is of distance away from . For points lying directly on the manifold in , we can uniquely represent them as

 [¯x,f1(¯x),…,fn−m(¯x)]T.

Here denotes the coordinates of the projection of the point onto . In the above representation, the point is taken to be the reference point. We further assume that the embedding to be . By Taylor expansion, we get

 fi(¯x)=fq,l(¯x)+Rl(¯ξl),l=1,…,n−m,

where is quadratic and . This implies that . We denote

 Cs=maxCs,l,l=1,…,n−m.

We define to be the principal curvatures of the hypersurface

 Sl={[¯x fl(¯x)]T:¯x∈TPS}∈Rm+1.

The maximal principal curvature is defined as

 Kmax:=Kl′,j′where(l′j′)=argmaxl,j|Kl,j|.

Let us consider points,

 {Pi}Ki=1={[x(i)1,x(i)2,…,x(i)m,f1(¯xi),…,fn−m(¯xi)]T}Ki=1,

in . Here represents the coordinates of the projection of on for . We can represent the points in a matrix as

 X(K)=\scalebox0.75⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣x(1)1…x(K)1⋮⋮x(1)m…x(K)mf1(¯x1)…f1(¯xK)⋮⋮fn−m(¯x1)…fn−m(¯xK)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦,

and consider the noise matrix to be

 Y(K)=⎡⎢ ⎢ ⎢⎣e(1)1…e(K)1⋮⋮e(1)n…e(K)n⎤⎥ ⎥ ⎥⎦.

The noisy data to be considered is then . Assuming that the data has been centered, the covariance matrix is given by

 M(K) =1K(X(K)+Y(K))(X(K)+Y(K))T =1K(X(K)(X(K))T)+E(K) =M(K)q+Δ(K)+E(K).

where and are defined in  and .

###### Theorem 4.1.

Define . Let

 α=min{(12(RL+β2))−1/2,(12β3)−1/3,(12β4)−1/4},

where , , and assume that

 (24∥E∥F)1/2<α.

Let the projection of sample points onto lie within . If the sampling width satisfy

 (24∥E∥F)1/2

Then as , we get

 |∠^TPS,TPS|

where

 σ∞=∥BT1∥F+v224v24−RLv4−2(∥B1∥F,bound+∥D1∥F,bound)

if .

###### Theorem 4.2.

For some fixed and such that , , , let

 α=min{(12(s2RL+β2))−1/2,(12β3)−1/3,(12β4)−1/4},

where , , and assume that

 (24∥E(K)∥F)1/2<α.

Let the projection of sample points onto lie within where is derived in Lemma 3 of . If the sampling width satisfy

 (24∥E(K)∥F)1/2

Then

 |∠^TPS,TPS|

where

 σK=s3+∥B1∥F,bound+v224s1v23−s2RLv4−v224−2(∥B1∥F,bound+∥D1∥F,bound)

if .

###### Remark 2.

We see that when the curvature of the manifold increases, will increase which will cause to decrease. Thus, to get an accurate approximation, we need a smaller sampling width, .

#### 4.3.2 Error Bounds for MLS

In this subsection, we give an approximation order for the MLS for noisy data of a manifold . We provide the required definitions and the approximation order for MLS on noiseless manifold data before finally giving the approximation order for the noisy data.

###### Definition 4.1 (h-ρ-δ set).

Let be a -dimensional domain in , and consider sets of data points in . We say that the set is an -- set of fill distance , density , and separation if:

1. Fill distance: is the fill distance with respect to the domain

 h=supx∈Ωminxi∈X∥x−xi∥.
2. Density:

 #{X∩^Bkh(y)}≤ρ⋅kd,q≥1,y∈Rn.

Here denotes the number of elements in a given set , while is the closed ball of radius around .

3. Separation: such that

 ∥xi−xj∥≥hδ,1≤i
###### Theorem 4.3 (Approximation Order for Noiseless Samples, Theorem 4.15 ).

Let be an -- set sampled from a -dimensional submanifold . Then for fixed and , there exists a fixed , independent of , such that the approximation given by MLS is well conditioned for with a finite support of size . In addition, the approximant yields the following error bound:

 ∥SM−M∥\scaleto\emph{Hausdorff}4pt

where

 ∥SM−M∥\scaleto\emph{Hausdorff}4pt=max{maxs∈SMd(s,M),maxx∈Md(x,SM)}.

is the -degree polynomial approximation approximation of using MLS and is the Euclidean distance between a point and a manifold .

We extend the results in Theorem 4.3 to noisy samples. Let be the set of noiseless samples of the manifold and be the noisy data set where is the noise for the sample point.

###### Lemma 4.1.

Assume that there exists a -dimensinoal manifold with being the set of sample points and a diffeomorphism such that . Let be an -- set. Then for fixed and , let , independent of , such that the approximation given by MLS is well conditioned for with a finite support of size . Then the approximant yields the following error bound:

 ∥S¯M−¯M∥\scaleto\emph{Hausdorff}4pt<¯C⋅¯hm+1,

We omit the proof for Lemma 4.1 as it is straightforward and can be derived directly by applying Theorem 4.3.

###### Theorem 4.4 (Approximation Order for Noisy Samples).

Assume that there exists a -dimensinoal manifold with being the set of sample points and a diffeomorphism such that . Let be an -- set. Then for fixed and , let , independent of , be such that the approximation given by MLS is well conditioned for with a finite support of size . With high probability such that , then the approximant yields the following error bound:

 ∥S¯M−M∥\scaleto\emph{Hausdorff}4pt<¯C⋅¯hm+1+Ce.
###### Proof.

With high probability,

 ∥S¯M−M∥\scaletoHausdorff4pt =∥S¯M−¯M+¯M−M∥\scaletoHausdorff4pt ≤∥S¯M−¯M∥\scaleto% Hausdorff4pt+∥¯M−M∥\scaletoHausdorff4pt <¯C⋅¯hm+1+maxiei <¯C⋅¯hm+1+Ce.

## 5 Algorithm

### 5.1 Subsampling

Given a set of points which lie close to or on a manifold of dimension , we want a subset, , such that contains points that are of large measure. The points in will be used as anchor points to fit a manifold to the input data. The aim of introducing the subsampling step is to reduce the computation complexity of the MLS as compared to using the entire data set as the anchor points.

We construct from an iterative algorithm. First we define an empty set and we iteratively check and add points from until we have gone through all points. We briefly describe the algorithm and give its pseudocode below. Let and define . At each step , we remove a random point and points from that are of distance or less from and call this set shown in Figure 4. Then find such that points in are also of distance or less from . For each point in , calculate the local measure defined in Definition 3.3 using points in . If this value is more than a threshold , we add it to . At the end of the th iteration, we add the points from to . This is illustrated in Figure 4. We repeat this until is empty. Figure 4: Illustration of the subsampling algorithm at the ith iteration for a 2-dimensional manifold in R3. The red circle represents Ni, the neighbourhood of radius r. The brown cross in the middle of the circle is the point x and the black dots are the points in Ni.

### 5.2 Manifold Fitting

The subsampling algorithm outputs a subset of points from the point set which would contribute greatly to the local measure function. Once this subset has been determined, MLS is employed locally to approximate a smooth polynomial surface and the points are projected to this surface.

## 6 Numerical Results

In this section, we present some numerical results for our method. We first test our method on various manifolds, i.e. 1-dimensional helix, six-folded curve and 2-dimensional Swiss roll, that lie in 3-dimensional space. These examples illustrate Remark 2 where the manifold approximation at points with higher curvature will have larger errors as compared to areas with lower curvature. Lastly, we apply our method on noisy face images, where our method is compared with various methods including local and global PCA, Isomap, and LLD.

### 6.1 1-Dimensional Manifold

For this simulation, we used a helix and six-folded curve data set to test our method. We first describe how the data are simulated and discuss the results.
1-Dimensional Helix
We sampled 7500 points on a 1-dimensional helix

using a uniform distribution for

and added Gaussian noise to the sample points. We applied local PCA, Isomap and our method to the input data, the results are shown in the first row of Figure 5 below. In Figure 5, the blue cross represents the noisy data and the orange points represent the results of our method. The results from our method is plotted together with local PCA, Isomap and LLD in Figure 5. In Figure 5, we compare our method in orange and the noiseless samples from the helix in blue.
Six-folded data
We sampled 6284 points on a six-folded curve which lie on a sphere. Gaussian noise were added to the sample points. Similarly, we compare our method to local PCA, Isomap and LLD. The simulation for the six-folded data are shown in the second row of Figure 5. In Figure 5, the blue cross represents the noisy data and the orange points represent our method. Similarly, the results from our method is plotted together with local PCA, Isomap and LLD in Figure 5. In Figure 5, we compare our results in orange and the noiseless samples from the six-fold in blue.

###### Remark 3.

Isomap is a well known dimension reduction method . In Isomap, we construct an approximated geodesic distance matrix, , between pairs of points and use it to construct an estimated Gram matrix,

 ^G=−12JD2J,

where is a centering matrix. The estimated Gram matrix is then decomposed into its corresponding eigenpairs. Similar to PCA, we use the

leading eigenvectors to project the input data into

where is the intrinsic dimension of the manifold. To visualize, we use the leading eigenvectors to project the data from the lower dimensional space back into the ambient space. Figure 5: 1-dimensional manifold. The first and second row corresponds to the 1-dimensional helix and six-folded data respectively. (a) and (d) Our method (orange) and the input data (blue). (b) and (e) Our method (orange), local PCA (green), Isomap (yellow) and LLD(gray). (c) and (f) Our method (orange) and noiseless data (blue).

Comparing the results in Figure 5 and 5, our method is able to retrieve the geometry of the entire manifold. Local PCA is able to approximate the manifold locally with linear vectors and retrieve some geometric properties of the manifold. However, due to the variance of the noisy data, it can affect the results and cause the local PCA to fail. In Figure 5 we see that local PCA seems to fail at two regions in the helix and in Figure 5 it fails at various regions.

Isomap is a dimension reduction method and using Isomap, we can visualize the data in the lower dimensional space. However, when we try to map the data back into the ambient space, we see that we are unable to retrieve the geometric properties of the manifold. Here, Isomap projects the points onto a 1-dimensional vector which is expected as we only use the first leading eigenvector (since the intrinsic dimension of the manifold is 1) for the projection. In this 1-dimensional example, LLD clearly fails to recover the geometry of the manifold. From Figure 5 and Figure 5, we can see that the recovered curve is close to the original curve and it differs more at the locations with high curvature.

###### Remark 4.

In our simulations, we did not compare our results with LLE which is another dimension reduction technique. LLE lets the user visualize data that lies in a high dimensional ambient space in a much lower dimensional space. LLE differs from Isomap in the sense that the information of the original coordinates are not embedded into the lower dimensional representation. Thus, unlike Isomap, with LLE we are unable to map the lower dimensional representation back into the high dimensional ambient space.

### 6.2 Swiss Roll

We sampled 5000 points on a Swiss roll

using a Gaussian distribution for

and . Gaussian noise were added to the sample points. Note that the intrinsic dimension of the manifold is 2. We applied our method to the data and results are shown in Figure 6. In Figure 6, the blue asterisks represents the noisy data and the orange points represent the projected data. The results from our method and LLD are projected onto the - plane by removing the component which are then plotted in Figure 6. The same type of projection is applied for our method and the result is shown in Figure 6. Figure 6: Noisy Swiss roll data. (a) 3D Scatter plot of the noisy (blue) and the projected data (orange). (b) Projected 2D plot of our method (orange) and LLD (gray). (c) Comparison between the results from our method (orange) and noiseless data (blue).

We see that our method is also able to denoise the input data and the results show that we are able to retrieve the geometric properties of the manifold. Interestingly, for the 2-dimensional simulation, LLD is able to denoise the input data and retrieve the geometric properties of the manifold as compared to the 1-dimensional case where LLD was not able to learn the manifold. In Figure 6, we can see that from the projected view, the shape which we have obtained is close to the original Swiss roll.

###### Remark 5.

For this simulation, we do not show the results of local PCA and Isomap. This is because for Isomap, the input data will just be projected onto a 2-dimensional plane. And for local PCA, it will be similar to the 1-dimensional example where for some local regions, it will be able to capture the geometry of the manifold but it will not be able to retrieve the geometric properties of the entire manifold.

### 6.3 Image Data I

The image data is taken from the video “Sub 4” from . The video database consists of 60 different videos and each video records the face of a different person following a set of instructions given in Table 1 in . The video database can be found in the link: https://sites.google.com/site/nirdatabase/download. We chose to work with “Sub 4” as we wanted more facial features to test our method.

The video consists of 16708 frames and each frame is extracted and converted from RGB to grayscale images. Furthermore, we crop each frame to only include the face and a small area around it and resize the image to pixels. The working data set is then where are matrices of dimension . To compare the effectiveness of our method with other methods, we add Gaussian noise with different variance ( and ) to the data set and run our method at the four levels of noise.

Given the original data and the noisy data , we are able to calculate the Signal to Noise ratio (SNR) using

 SNR(Xi)=∥Xi∥F∥Xo% rii−Xi∥F, (10)

where denotes the Frobenius norm of a matrix. Figure 7 shows the SNR calculated by (10). We see that for the image that is added with higher noise variance, it will have a lower SNR as compared to the image that is added with a lower noise variance. Figure 7: The SNR of the image data with noise levels 1 (blue), 2 (orange), 3 (yellow) and 4 (purple).

To highlight the comparison with other methods, we selected 5 different head postures to illustrate the results. The full results are available at https://www.stat.nus.edu.sg/~stayz/research.html. The original image for the 5 different head postures are show in Figure 8.

We present the results in Figures 10, 10, 12 and 12 with the figures corresponding to the four levels of noise. Within each figure, each of the rows corresponds to noisy image, Isomap, global PCA, local PCA, LLD and our method, respectively.

For each noise level, we have extracted 10 leading eigenvectors for PCA. Here we use 10 principal components as the PCA results do not differ much even if we work with more than 10 leading eigenvectors and the results get worse when more than 20 principal components are chosen. For fair comparison, we also choose to use 10 leading eigenvectors for all the algorithms. To be consistent in our analysis, we chose 150 nearest neighbours for the tangent space approximation in MLS and used functions of degree one (linear) when applying the MLS step. For every noise level, our method has been iterated 5 times.

We can see that our results are by far the most visually pleasing when compared to Isomap, global or local PCA. We also observe that from the results of Isomap or global PCA, we are unable to retrieve the main features of the face even for the data with high SNR. On the other hand , local PCA and LLD are able to denoise the images such that the facial features can be retrieved for the data with high SNR. For data with low SNR, we are still able to retrieve some of the features of the face from local PCA and LLD but compared to our method, the images are noisier then the results from our method.

For the data with higher SNR, our method is still able to do a decent job at denoising the images. We are able to retrieve many of the prominent features of the face. The results are expected as looking at Figure 7, the SNR for each frame for the noise level 1 and 2 is much higher as compared to that of the SNR for the data with noise level 4. With a higher level of noise when compared to the signal (lower SNR), it naturally becomes more difficult to denoise the data. This difficulty of denoising data with high noise levels is further shown in the results at the noise level 4. For the higher noise levels, we are unable to retrieve many of the features of the face. However, our method is still able to denoise the data minimally where we are still able to retrieve the information of which direction the face is facing.

From this numerical result, we can see that our method is able to work well even with data in high dimensional ambient space and only using a degree one polynomial in the MLS step. To achieve better results, we can use a polynomial of higher degree to do the approximation at the MLS step, however, the trade off will be that there will be a higher computation cost.

### 6.4 Image Data II

In this example, we work with the same data set as in Subsection 6.3 but add a different type of noise only to 15 of the frames. We choose the same 5 frames as in Figure 8 and for each of the 5 frames, we also choose the 2 nearest frames in terms of the Euclidean distance. For each of the 15 frames, we add Gaussian noise in a narrow horizontal band in the middle of the image. Then our method is compared with LLD and the results are shown in Figure 13. Figure 13: The first, second and third row corresponds to the noisy images, the results from LLD and our method respectively.

We expect to observe if only a handful of data is corrupted in such a way that most part of the image remains noiseless, then will the manifold learning algorithms be able to denoise the images much better or instead corrupt the images more. Here, as majority of the images are noiseless, we choose to use 5 principal components and 200 nearest neighbours for LLD and tangent space approximation. From the results, we can see that our method and LLD are able to almost fully denoise the images with the results of our method being better. However, there are also some blurring of the facial features as can be seen in the 4th and 5th images for both our method and LLD. This is due to a smaller number of principal components being utilized.

## 7 Conclusion

In this paper, we investigate the problem of manifold fitting. Our focus differs from the usual problem that aims to map the input points into a lower dimensional space so as to extract the lower dimensional structure. Instead, we propose to fit a manifold in the ambient space to a set of data points possibly with noise. A key feature of our approach is to utilize the eigenspectrum of the Laplace-Beltrami operator, known as the Shape-DNA, which helps reduce the computational complexity by finding a smaller set of points to operate on, while it preserves the geometric structure. This simple yet efficient method allows us to approximate a manifold from the input data without any constraints regarding the intrinsic dimension of the underlying manifold. Furthermore, the output of our method is in the original ambient space which will be more practical for many real life applications as we will be able to visualize or compare the output in the original space. In our work, we also provide the error bounds for estimating the tangent space using PCA and surface approximation using MLS. In our numerical simulations, the results are encouraging. In the simulations for the 1-dimensional manifold, we can see that the the approximated manifold from our method is close to the original manifold. For the face images, the output from our proposed algorithms yields visually pleasing denoising results for the input with high SNR and visually acceptable results as compared to PCA or local PCA for the input with lower SNR.

We view the subsampling step proposed in this paper as a general step. This subsampling step can also be adapted by other well established manifold learning methods to reduce the complexity of their algorithm. As a whole, our method may also be adapted into other dimension reduction techniques to first denoise the input data so that the lower dimensional structure can be extracted more accurately.

We observe that LLD does not work well for the 1-dimensional simulation but seems to perform better for the image data with lower SNR. This could be due to the global alignment step that integrates the information of each point from all the local neighbourhood that the point belongs to. A similar global alignment step was also studied in  which proposes an algorithm for dimension reduction. Thus it would be interesting to see if our manifold fitting results can be further improved by including the same global alignment step or whether the global alignment step can be modified to work with our method. We leave these suggestions to improve on our proposed method to future work.

## Acknowledgments

We are grateful for the financial support from the MOE Tier 1 funding (R-155-000-196-114) and Tier 2 funding (R-155-000-184-112) at the National University of Singapore.

## Appendix 1: Proof of Theorem 4.1

###### Proof.

We consider the case when , then where . By Lemma 2 of ,

 Mq=[v23Im×m00D]

and

 ∥Δ∥F <2∥B1∥F+∥D1∥F <2Csd3/2√d(n−d)v4+(n−d)(C2sd3v6+Csd5/2v5|Kmax|) =2∥B1∥F,bound+∥D1∥F,bound :=∥Δ∥F,bound.

If , then the eigenvectors corresponding to the eigenvalue span . As , we analyze how the term affects the tangent space . From Weyl’s inequality,

 λi(M)∈[λi(Mq)−∥Δ∥F,bound−∥E∥F,λi(Mq)+∥Δ∥F,bound+∥E∥F],i=1,…,n.

Recall that for and are eigenvalues of . We require the separation of the eigenvalues, i.e. the spectral radius of , , to be less than to analyze the perturbations of .

 v23−∥Δ∥F,bound−∥E∥F >ρ(D)+∥Δ∥F,bound+∥E∥F v23−ρ(D) >2(∥Δ∥F,bound+∥E∥F). (11)

In , it has been shown that where

 L=m(5m+4)|Kmax|2180andR={1: if D is diagonal(n−