LOCA: LOcal Conformal Autoencoder for standardized data coordinates

by   Erez Peterfreund, et al.
Yale University

We propose a deep-learning based method for obtaining standardized data coordinates from scientific measurements.Data observations are modeled as samples from an unknown, non-linear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized latent variables. By leveraging a repeated measurement sampling strategy, we present a method for learning an embedding in R^d that is isometric to the latent variables of the manifold. These data coordinates, being invariant under smooth changes of variables, enable matching between different instrumental observations of the same phenomenon. Our embedding is obtained using a LOcal Conformal Autoencoder (LOCA), an algorithm that constructs an embedding to rectify deformations by using a local z-scoring procedure while preserving relevant geometric information. We demonstrate the isometric embedding properties of LOCA on various model settings and observe that it exhibits promising interpolation and extrapolation capabilities. Finally, we apply LOCA to single-site Wi-Fi localization data, and to 3-dimensional curved surface estimation based on a 2-dimensional projection.



page 10

page 12

page 14

page 23


Generative Model without Prior Distribution Matching

Variational Autoencoder (VAE) and its variations are classic generative ...

A regression approach for explaining manifold embedding coordinates

Manifold embedding algorithms map high dimensional data, down to coordin...

Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM)

We introduce the Locally Linear Latent Variable Model (LL-LVM), a probab...

Uniform Interpolation Constrained Geodesic Learning on Data Manifold

In this paper, we propose a method to learn a minimizing geodesic within...

Learning low-dimensional feature dynamics using deep convolutional recurrent autoencoders

Model reduction of high-dimensional dynamical systems alleviates computa...

Environmental Economics and Uncertainty: Review and a Machine Learning Outlook

Economic assessment in environmental science concerns the measurement or...

3D Morphology Prediction of Progressive Spinal Deformities from Probabilistic Modeling of Discriminant Manifolds

We introduce a novel approach for predicting the progression of adolesce...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Reliable, standardized tools for analyzing complex measurements are crucial for science in the data era. Experimental data often consist of multivariate observations of a physical object that can be represented as an unknown Riemannian manifold. A key challenge in data analysis involves converting the observations into a meaningful and, hopefully, intrinsic parametrization of this manifold. For example, in astrophysics, one is interested in a representation that is coherent with the material composition of stars based on measurable high dimensional spectroscopic data [1, 6]. This type of challenge has typically been studied under the broader umbrella of dimensionality reduction and manifold learning, where numerous algorithmic solutions have been proposed [9, 15, 36, 5, 8, 24, 3, 19, 20, 22]

. These methods rely on statistical or geometrical assumptions and aim to reduce the dimension while preserving different affinities of the observed high dimensional data.

In this paper, we focus on data obtained from several observation modalities measuring a complex system. These observations are assumed to lie on a path-connected manifold which is parameterized by a small number of latent variables. We assume that the measurements are obtained via an unknown nonlinear measurement function observing the inaccessible manifold. The task is then to invert the unknown measurement function, so as to find a representation that provides a standardized parametrization of the manifold. In general, this form of blind inverse problem may not be feasible. Fortunately, in many cases, one can exploit a localized measurement strategy, suggested in [29], to extract an embedding into internally standardized (z-scored) latent variables.

Toward a useful formulation of our problem we note that, in numerous real-world scenarios, it is possible to capture data using a localized “burst” sampling strategy [23, 32, 16, 38, 37, 27, 37, 30, 28, 33]. To motivate this type of “burst” sampling, we describe a toy experiment (see Fig. 1). Consider the task of recovering the geometry of a curved -dimensional homogeneous surface in three dimensions using a laser beam, which heats the surface locally at several positions. Here, a “burst” is realized through the brief local isotropic propagation of heat around each laser impact location (each “data point”), which can be visualized as a local ellipse by a thermal camera. Now, the task is to recover the curved geometry of the surface in three dimensions using the collection of observed local -dimensional ellipses.

More generally, our strategy is realized by measuring such brief “bursts”, which are modeled as local isotropic perturbations added to each state in the inaccessible latent manifold. The “bursts” provide information on the local variability in a neighborhood of each data point. Thus, they can be used to estimate the Jacobian (modulo an orthogonal transformation, as we will discuss) of the unknown measurement function. The authors in [29]

use such “bursts” and suggest a scheme to recover a representation that is invariant to the unknown transformation. Specifically, they use a local Mahalanobis metric, combined with eigenvectors of an Anisotropic Laplacian, in order to extract a desired embedding.

Solutions such as [29] and extensions such as [34, 30, 10, 14, 25] can be used. There remain, however, several challenges: (i) they require a dense sampling of the deformed manifold. (ii) they deform the representation due to inherent boundary effects, and (iii) they do not extend easily to unseen samples. To overcome these limitations, we introduce the concept of a LOcal Conformal Autoencoder

(LOCA), a deep-learning based algorithm specifically suited to “burst” measurements. LOCA is realized using an encoder-decoder pair, where the encoder attempts to find a mapping such that each “burst” is locally whitened (z-scored). By forcing reconstruction, the decoder ensures that geometric information has not been lost. We have found LOCA to be scalable, as well as easy to implement and parallelize using the existing deep learning open-source codebase. We provide empirical evidence that the LOCA embedding is approximately isometric to the latent manifold and extrapolates reliably to unseen samples. We discuss a scheme to automatically tune the minimal embedding dimension of LOCA, and demonstrate its precision in two real data problems.

The contributions in this paper are as follows. (i) We show that the localized sampling strategy (our “bursts” at a given scale) generates a consistent Riemannian structure; under certain conditions, it allows inverting the unknown measurement function (modulo a shift and orthogonal transformation). (ii) We present a two-step optimization scheme for learning a parametric mapping, which is approximately an isometry of the latent manifold. (iii) We demonstrate the isometric properties of the extracted encoder on several examples. (iv) We verify empirically that the extracted neural network has good interpolation and extrapolation properties.

Figure 1: A motivating example for LOCA- learning the stereographic shape of a surface. We consider a laser beam used to locally heat the surface at several positions. A thermal camera measures the brief isotropic propagation of heat around each location. By scanning a thermal image, we can identify the neighborhoods of each position, which we define as our “bursts”. LOCA uses these “bursts” to invert the projection and recover a scaled version of the curved surface.

2 Problem Settings

The burst measurement strategy.

Consider first, for simplicity, the case where the latent domain for our system of interest is a path-connected domain in a Euclidean space . We call the latent space. Observations of the system consist of samples captured by a measurement device given as a non-linear function , where is the ambient, or “measurement” space. Even if is invertible, it is generally not feasible to identify without access to . Here, we assume that (a) is smooth and injective; and that (b) multiple, slightly perturbed versions of the physical system point in give rise to multiple (slightly perturbed) measurements in . In this notation, by exploiting a specific type of local perturbation, we develop a method to recover a standardized version from (up to an approximately isometric transformation which, for Euclidean spaces would be a rigid transformation).

Consider data points (“burst” centers), denoted in the latent space. Assume that all these points lie on a path-connected, -dimensional sub-domain of ; we will later discuss the restriction to Riemannian manifolds with a smaller dimension than the full space. Importantly, we do not have direct access to these states. The states are pushed forward to the ambient space via the unknown deformation which defines . We do not only observe the states ; we rather assume a specific perturbed sampling strategy. For each ,

, we assume that the perturbed observed state is given as the random variable


where are i.i.d distributed by . Our sampling strategy relies on measuring perturbed realizations of , which we denote as . We assume that , or alternatively, that is sufficiently small such that the differential of practically does not change in a ball of radius around any point. Such sufficiently small allows us to capture the local neighborhoods of the states at this measurement scale on the latent manifold. Note that alternative isotropic distributions, which satisfy this condition, could be used to model .

Let us explore the implications of this localized sampling strategy for learning a representation that is consistent with . Specifically, our goal is to construct an embedding that maps the observations , so that the image of is isometric to when is known. In our Euclidean setting, such an isometric embedding should satisfy


We note that if is not known, we will relax (2) by allowing a global scaling of the embedding. This means that we are only looking for a representation that preserves the pairwise Euclidean distances between the latent samples, rather than obtaining their actual values. More specifically, a that satisfies (2) is not unique, and is defined up to an isometric transformation of the data. We refer to representations which satisfy Eq. 2 up to errors smaller than as “isometries”.

3 Related work

The problem of finding an isometric embedding was also studied in [21]. The paper proposes an algorithm to embed a manifold with dimension into a space of dimension . The method in [21] is built upon [22] and uses a discrete version of the Laplace-Beltrami operator, as in [8], to estimate metric of the desired embedding. To force the embedding to be isometric to the observed samples , the authors propose a loss term to quantify the deviation between the push-forward metric from the observed space, , to the embedding space compared with the restricted Euclidean metric. The embedding is refined by applying gradient descent to the proposed loss. The approach successively approximates a Nash embedding with respect to the observed space, , for which it is required that the manifold be densely sampled at all scales.

In this work, we use “bursts” to learn an embedding that corrects the deformation and isometrically represent the inaccessible manifold . The idea of using “bursts”, or data neighborhoods, to learn a standardized reparametrization of data was first suggested in [29]. The authors assume the data is obtained via some unknown nonlinear transformation of latent independent variables. Locally, the distortion caused by the transformation is corrected by inverting a Jacobian that is estimated from the covariances of the observed “bursts”. This allows the authors to define a local Mahalanobis metric (which is affine invariant). Then, this metric is used to construct an anisotropic ”intrinsic Laplacian” operator. Finally, the eigenvectors of this Laplacian provide the independent components and are used as a “canonical” embedding. This framework was extended in several studies such as [10, 32, 34, 25].

The work of these authors can be improved in three directions. First, they require inverting a covariance matrix in the ambient space. Second, they suffer from deformation on boundaries. Third, they typically do not provide an embedding function that can be naturally extended over the entire data domain and beyond; instead, they provide a specific mapping for the existing training samples. This last direction means that to embed test data, methods such as [7, 23] could be employed. The mapping approximations based on these methods are limited and cannot extend further than a small neighborhood around each training point. Furthermore, even though the provided embedding is unique, it is not isometric to the latent variables. We present a method that alleviates these shortcomings, and empirically demonstrate that it extracts a canonical representation that is isometric to the latent variables.

Perhaps the most related work to this study was recently presented by [27]. The authors consider “bursts” to develop a method for finding an embedding that is isometric to the latent variables. They build upon Isomap [35], and use two neural networks to refine the Isomap based embedding. The first neural network is used in order to obtain a continuous model for estimating the covariance . The covariances are used for calculating local Mahalanobis based distances, which are fed into Isomap to obtain an initial embedding. Next, they train an additional neural network to correct the Isomap embedding so that the Euclidean distances will approximate the local Mahalanobis distances. In this paper, we take a different and, we believe, more systematic/general approach by presenting a simple encoder-decoder pair (see Fig. 2) that is directly applicable to samples in the observed high dimensional space. Specifically, our approach provides a parametric mapping that allows us to extend the embedding to new unseen samples naturally. Furthermore, we learn the inverse mapping, which could be used to generate new samples by interpolating in the latent space.

Figure 2: An illustration of the LOcal Conformal Autoencoder (LOCA). “E” stands for the encoder , and “D” for the decoder . The autoencoder receives a set of points along with corresponding “neighborhoods”; each neighborhood is depicted as a dark oval point cloud (see the top row in the figure). On the bottom row, we zoom in onto a single “anchor” point (green) along with its corresponding neighborhood (bounded by a blue ellipsoid). The encoder attempts to “whiten” each neighborhood in the embedding space, while the decoder tries to reconstruct the input.

4 Deriving an Alternative Isometry Objective

Without access to samples from the objective described in (2) does not provide any information for extracting . Here we reformulate this objective by utilizing the special stochastic sampling scheme presented in Section 2 and relate it to the differential equation for the embedding described in Lemma 1. We start by plugging the unknown measurement function into (2); then, we can approximate its left hand side using a first order Taylor expansion

Hence, by neglecting higher order terms, we can define the following objective


which allows us to evaluate the isometric property of .

Now we want to relate the Jacobian in Eq. 3 to measurable properties of the observations . Specifically, we can rely on the following Lemma to approximate the derivatives of the unknown function at each point . The Lemma is proved in the Supporting Information (A):

Lemma 1.

Let be a function, where and . Let and . Define a random variable . If the function satisfies and is injective, there exist a such that the covariance of the transformed random variable is related to the Jacobian of at via


By setting , this Lemma provides a relation between the Jacobian of and the covariance in the embedding space. Specifically, this translates to a system of differential equations for the Jacobian of an isometric (Nash) embedding


When , we can tie the approximation of objective (2) with (4) by


Thus we can evaluate the embedding function at each point without gaining access to the latent states of the system.

Input: Observed clouds ,i=1,…,N.
Output: and - the weights of the encoder and decoder neural network.

1:  for  do
2:     Compute the whitening loss
3:     Update
4:     Compute the reconstruction loss
5:     Update and
6:  end for
Algorithm 1 LOCA: LOcal Conformal Autoencoder

5 LOcal Conformal Autoencoder

We now introduce the Local COnformal Autoencoder (LOCA), with training Algorithm 1. Our method is based on optimizing two loss terms; the first is defined based on (5) using what we refer as a “whitening” loss


where is an embedding function, and is the empirical covariance over a set of realizations , where are realizations of the random variable .

Figure 3: Evaluating the isometric quality of the embedding (the setting is detailed in 6.1). The points in the latent space of the system of interest LABEL: are LABEL: pushed forward to the measurements by applying the nonlinear transformation (described in (8)). The color in both figures corresponds to the values of of the data. We observe “bursts” around each sample (based on the burst model described in Section 2). To illustrate this “burst” sampling scheme, we overlay the points with additional green samples generated by bursts at different positions. LABEL: Euclidean distances between pairs of points in the latent space plotted versus the corresponding Euclidean distance in the embedding space. The corresponding distances for DM and for A-DM are also shown in color, scaled with a factor which minimizes the stress (defined in (9)).

As is invertible on its domain, an embedding function that approximates should be invertible as well. The invertibility of means that there exists an inverse mapping , such that for any . This additional objective helps remove undesired ambiguities (which may occur for insufficient sampling). By imposing an invertibility property on , we effectively regularize the solution of away from noninvertible functions. To impose invertibility, we define our second loss term, referred to as ”reconstruction” loss:


We suggest finding an isometric embedding based on an autoencoder, where will be defined as the encoder and as the decoder. We construct solutions to (6) and (7) with a neural network ansatz and consisting of layers each, such that

where and . Here, , and , are the weights and biases at layer of the encoder and decoder, respectively. The functions

are nonlinear activations applied individually to each input coordinate. As the activation function can have a limited image, we recommend removing the non linear activation for


We propose to find and

by alternating between a stochastic gradient descent on (

6) and (7). It is important to note that the main objective that we are trying to optimize is based on (6), therefore (7) can be viewed as a regularization term. A pseudo-code of this procedure appears in Algorithm 1. To prevent over-fitting, we propose an early stopping procedure [17, 4] by evaluating the loss terms on a validation set. In section 6, we demonstrate different properties of the proposed LOCA algorithm using various geometric example manifolds.

Note that functions that perfectly satisfy our objectives are not unique, i.e. for any solution we can define an equivalent solution that will attain the same loss. Specifically, we can define it by for any , where and .

To summarize: (i) We collect distorted neighborhoods of a fixed size () around data points of the system of interest. (ii) We embed/encode the data in a low dimensional Euclidean space so that these neighborhoods are standardized or z-scored. (iii) The embedding is decoded back to the original measurements, to regularize the encoder. In Section 6, we demonstrate that (iv) The encoder is invariant to the measurement modality (up to errors of , and modulo an orthogonal transformation and shift). (v) The parametric form of the embedding enables reliable interpolation and extrapolation.

Figure 4: Evaluating out-of-sample performance of the encoder, detailed in Section 6.2. LABEL: The latent space of interest ; our training region is bounded here by the black and green frames. The interpolation region lies within the green frame, while the extrapolation regime lies outside the black frame. LABEL: The observed space with corresponding regions of interest. LABEL: The calibrated embedding (using an orthogonal transformation and a shift) with corresponding regions of interest. The color in these figures corresponds to the values of . Here, we calibrated the embedding merely for visualisation purposes. LABEL: The Euclidean distances between pairs of points in the latent space versus the corresponding Euclidean distance in the embedding space.

6 Properties of LOCA

In this section, we evaluate the properties of the proposed embedding by generating various synthetic datasets. We compare the extracted embedding provided by LOCA (described in Section 4) with the embeddings of alternative methods such as Diffusion Maps (DM) [8] and Anisotropic Diffusion Maps (A-DM) [29] denoted as and respectively. Note that the Diffusion Maps algorithm does not use the “burst” data, while the Anisotropic Diffusion Maps uses it in order to construct a local Mahalanobis metric. We present the details of the exact implementation for each of the methods in the supplementary information (see Section B).

6.1 LOCA creates an isometric embedding

We first evaluate the isometric quality of the proposed embedding with respect to the true inaccessible structure of . Here, we follow the setting in [29], where the intrinsic latent coordinates are independent, specifically distributed by . Based on this distribution, we sample anchor points along with “bursts” . Each “burst” consists of points sampled independently from , where .

We now define the non linear transformation,

(as in [29]) to the ambient space by


for any . In Fig. 3, we present measurements from with the corresponding measurements of , where . To illustrate the local deformation caused by , we overlay the samples with clouds around different positions (see green dots). Next, we apply LOCA (described in Algorithm 1) to compute an embedding that satisfies (6) and (7). We evaluate the isometric quality of LOCA by comparing the pairwise Euclidean distances in the embedding space to the Euclidean distances in the latent space . For comparison, we apply DM and A-DM (that also uses the “bursts”) to and plot the pairwise Euclidean distances in the embedding vs. the corresponding Euclidean distances in the latent space. Here, we evaluate isometry up to a scaling, as DM and A-DM use eigenvectors (that are typically normalized). The scaling is optimized to minimize the stress defined by


where is some embedding function from and . Specifically the stress values for LOCA, and the scaled versions of DM and A-DM are and , respectively. As evident from the stress values and from Fig. 3, LOCA provides an embedding that is isometric to (up to an orthogonal transformation and shift).

6.2 The encoder is observed to extend reliably to unseen samples

In the next experiment, we evaluate the out-of-sample extension capabilities of LOCA. The experiment is based on the same nonlinear transformation described in Section 6.1. We sample points from a partial region of the latent representation , specifically described by . In Fig. 4, we present the framed sampling regions along with the corresponding observed framed regions in (see black and green frames 4(a) and 4(b)). To generate the “bursts” we follow the setting presented in Section 6.1 and refer to them as our training set. The test set is defined by an additional samples generated as in Section 6.1 from in pushed forward by .

In Fig. 4(c), we quantify the interpolation and extrapolation capabilities of LOCA by presenting the extracted embedding along with the corrected frame. To further evaluate the quality of this embedding we compare the pairwise distances in 4(d) (as described in Section 6.1). This comparison (presented in Fig. 4(d)) supports the benefits of using LOCA for extending the representation to unseen samples. To stress it even more, the actual stress values of LOCA in the interpolation region, on the frame and in the extrapolation region are all approximately .

Figure 5: Evaluating the out-of-sample reconstruction capabilities of LOCA. Here, we attempt to generate new point in the ambient space by performing linear interpolation in the embedding space. A description of the linear interpolation appears in 6.3. LABEL: The inaccessible latent space, the points surrounding the green frame are the training samples. Interpolation is performed horizontally and vertically between points on the green frame, see for example the colored lines. LABEL: The pushed forward data from LABEL: based on the non linear function described in (8). LABEL: Recovered calibrated embedding . The training samples in the frame and green boarder are embedded using LOCA. Within the embedded green boarder we perform an additional linear interpolation, using the same corresponding pairs as were used in the latent space. For example, see the horizontal and vertical colored lines. LABEL: The pushed forward data from LABEL: by the decoder () learned by LOCA. This experiment demonstrates that LOCA learns a decoding function that is consistent with the unknown transformation, even in a regime that is not covered by training samples.

6.3 The decoder is observed to extend reliably to unseen samples

In this experiment we evaluate the out-of-sample capabilities of LOCA’s decoder. While in 6.2 we trained LOCA and evaluated the quality of the encoder on unseen data, here we focus on the performance of the decoder. Specifically, we apply the decoder to unseen samples from the embedding space. Each unseen sample in the embedding space is created using linear interpolation. We now provide the exact details of this evaluation.

For this interpolation experiment, we use the same LOCA model trained in 6.2 on the framed data. We further generate points in the interior boundary of the frame, represented by the green dots in Fig. 5. Next, we perform linear interpolation between horizontal and vertical pairs, see for example the colored lines in Fig. 5(a). The data is then pushed forward using the nonlinear transformation described in (8), as shown in Fig. 5(b). We embed the training samples along with the green frame using LOCA; a calibrated version of the embedding space is shown in Fig. 5(c) (up to a shift and an orthogonal transformation). Then, we perform an additional interpolation in the embedding space using the same corresponding pairs as were used in the latent space (see Fig. 5(a)). Finally, we apply the decoder to the embedding of the training samples and to the newly interpolated samples, these are presented in Fig. 5(d). As evident in this figure, the reconstructed points faithfully capture the mushroom shaped manifold. The mean squared error between the push-forward interpolated points and the decoded interpolated points is

, with a standard deviation of

. This experiment demonstrates that LOCA may be also used as a generative model, by reconstructing new points generated using interpolation in the embedding space.

Figure 6: The stereographic projection experiment (see description in Section 6.4). LABEL:: A schematic illustration of the stereographic projection generating the data. LABEL:: The original latent representation of the “bursts” employed. LABEL:: The 2-dimensional observations of the “bursts” created using the stereographic projection. The plot contains only the training data, meaning points that satisfy , leaving a “hole” at the south pole. LABEL:: The 3-dimensional embedding of these training data, with the missing lower cap (). The color represents the value of of each point as defined in (10). The colors used in LABEL:- LABEL: correspond to the spherical angle defined in (10). LABEL: The Euclidean distances between pairs of points in the original, 3-dimensional latent space versus the corresponding Euclidean distance in the embedding space. Here we compare distances based on the training region (frame) as well as the unseen test region where (interpolation).

6.4 LOCA on a curved manifold

Here, we examine a more challenging configuration, generalizing our original Euclidean problem setting. The latent space is now taken to be a dimensional manifold that resides in , where and is the minimal dimension required to embed the manifold in a Euclidean space isometrically. Interestingly, we consider an observation process such that the observation dimension, , is smaller than . To clarify, this means that the measurement process can involve projections to a lower dimension.

We consider a manifold that covers three quarters of a -dimensional unit sphere in , where the training points admit the following form


The manifold is embedded in but has an intrinsic dimension of . This requires us to revisit our definition of “bursts”, discussed in (1). Specifically, we assume that the bursts are confined to the manifold. Here, we approximate this constraint in the form of random variables obtained using a local isotropic Gaussian with a two dimensional covariance , defined on the tangent plane to the point.

We consider states of the system which are generated on a uniform grid using the “Fibonacci Sphere” sampling scheme [31] for points with . We define each “burst” using points sampled from our two-dimensional isotropic Gaussian defined by the tangent plane around with . Now, in order to create the observed samples we apply the stereographic projection to by projecting each point from onto a two dimensional space defined by:


The transformation can be thought of as a projection onto the plane ; an illustration of the stereographic projection appears in Fig. 6(a). The training “bursts” in the latent space and the observed space appear in Figs. 6(b) and 6(c), respectively.

We apply DM, Anisotropic DM, and LOCA to embed the data in a -dimensional space. The difference between the pairwise Euclidean distances (see description in Section 6.1) in each embedding space and the original Euclidean distances along with the extracted embeddings are described in the Supplementary Information. The stress values for LOCA, and the scaled DM and A-DM on the training data are and , respectively. In order to examine the interpolation capabilities of LOCA, we generate points using the ”Fibonacci Sphere” that satisfies . Using the trained model of LOCA we embed these data and get that the stress value is . Fig. 6 demonstrates that LOCA can well approximate an isometry even if the dimension of the observations is lower than the minimal embedding dimension needed for the isometry, i.e. .

7 Applications

7.1 Flattening a curved surface

Our first application is motivated by [11], in which the authors propose a method for estimating the -dimensional deformation of a -dimensional object. They focus on the task of autonomous robotic manipulation of deformable objects. Their method uses a stream of images from a RGB-D camera and aligns them to a reference shape to estimate the deformation.

We explore the applicability of LOCA for the task of estimating a deformation based on a -dimensional projection of an object, without using any depth feature. We print a black square-shaped grid of points of interest; at each location, we generate a “burst” with samples drawn from a Gaussian with . We manually squeeze the printed square and photograph the deformed object from above. The image of the original squared object along with a -dimensional snapshot of the deformed object appears in Fig. 7. This experiment complements our motivating example presented in Fig. 1.

To define the anchor points along with corresponding “bursts”, we first identify the locations of all points by applying a simple threshold filter to the image. Then, we identify the “bursts” by applying the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [26]. In Fig. 7 we present the identified groups of points (black). Note that some “bursts” are lost in this process, as there is nearly no gap between them in the deformed shape. Here, the parameter for the whitening loss (6

) is estimated using the median of the first eigenvalue of the “bursts” covariances. We apply LOCA and extract the embedding

. In Fig. 7, we present a calibrated version (scaled rigid transformation) of the embedding, , overlaid on the latent representation. The transformation is found by minimizing the mean squared error between the underlying representation and the extracted embedding of the corners of the square. This experiment demonstrates that LOCA corrects the unknown deformation based on the estimated “bursts”.

Figure 7: A LOCA embedding can flatten a deformed object using estimated “bursts”. LABEL: The input training samples used by LOCA (black) and the reconstructed points (red). LABEL: The calibrated embedding (using an orthogonal transformation and a shift) of the deformed object using LOCA (red) and the underlying representation of points (black). As in the synthetic examples, we use calibration only for visualization purposes. Here, LOCA manages to correct the deformation of the local “bursts”, and thus learns a function that approximately uncovers the latent structure of the object.

7.2 Application to Wi-Fi localization

Here, we evaluate LOCA for the task of geographical localization based on mobile devices. The localization problem involves estimating the geographic location of a receiver device based on signals sent from multiple Wi-Fi transmitters. This problem has been addressed by modeling the strength of the signal in time and space, or based on fingerprints of the signal learned in a supervised setting [12, 2]. We address the problem in an unsupervised fashion by applying the proposed LOCA algorithm without employing any physical model.

The experiment is performed by simulating the signal strength of Wi-Fi transmitters at multiple locations across a model of a room, where each transmitter uses a separate channel. The room is modeled based on a simplified floor plan of the fourth floor of MIT’s Ray and Maria Stata Center. We refer to the two-dimensional representation of the room as ; a schematic of the floor plan with pixels appears in Fig. 8 (black line). The Wi-Fi transmitters are randomly located across the floor plan; we denote each of these locations by , for any . Next, we sample , using anchor points distributed uniformly over

and define the amplitude of each measured Wi-Fi signal using a radial basis function (RBF). The RBF decay is monotonic in the distance between the transmitter and the measurement location, so that the amplitude at point

of the signal of transmitter is , where pixels. Here the “bursts” will be defined by a circle of receivers equally spaced at a radius of pixels around each anchor point : these 6 receivers model a circular sensor array as the measurement device.

Next, we apply LOCA and embed the observed vectors of multi-channel amplitudes into a

-dimensional space. To demonstrate the performance of LOCA we calibrate the LOCA embedding to the ground truth floor plan using a shift and scaled orthogonal transformation, as done in 7.1 but using all the training data. In Fig. 8 we present the scaled, calibrated two-dimensional embedding with the locations of the transmitters and anchor points.

Figure 8: Application of LOCA to Wi-Fi localization. We use a floor plan model based on the fourth floor of MIT’s Ray and Maria Stata Center. The edges of the ground truth model appear in black. We simulate Wi-Fi access points (transmitters), which are presented as black crosses. We use locations depicted as red dots with corresponding “burst samples” around them (modeling a circular antenna array). To demonstrate that LOCA’s embedding is coherent with the latent representation, we calibrate the embedding to the true floor plan, see blue dots and green line.

8 Discussion

We propose a method that extracts canonical data coordinates from scientific measurements. These approximate an embedding that is isometric to the latent manifold. We are assuming a specific, broadly applicable stochastic sampling strategy, and our proposed method corrects for unknown measurement device deformations. Our method constructs a representation that ”whitens” (namely, changes to multivariate z-scores) groups of local neighborhoods, which we call “bursts”. We impose additional constraints to patch together the locally whitened neighborhoods, ensuring a smooth global structure. Finally, the method is implemented using a neural network architecture, namely an encoder-decoder pair, which we name LOcal Conformal Autoencoder (LOCA).

The method can be summarized as follows. (i) We collect distorted neighborhoods of a fixed size of data samples.(ii) We embed/encode the data in the lowest dimensional Euclidean space so that these neighborhoods are standardized or z-scored.(iii) The data is decoded from embedding space to original measurements, enabling interpolation and extrapolation. (iv) LOCA is invariant to the measurement modality (approximately to second order, and modulo a rigid transformation). (v) Under scaling consistency for samples drawn from a Riemannian manifold, the encoder can approximate an isometric embedding of the manifold.

From an implementation perspective, our method is simpler than existing manifold learning methods, which typically require an eigen-decomposition of an -by- matrix ( being the number of samples). Indeed, existing implementations of deep neural networks enable a single developer to produce fast, reliable, GPU-based implementations of LOCA.

We provided solid empirical evidence that, if the deformation is invertible, then LOCA extracts an embedding that is isometric to the latent variables. Moreover, LOCA exhibits intriguing, indeed promising interpolation and extrapolation capabilities. To motivate the benefits of using LOCA, we used two potential applications. First, we apply a -dimensional deformation to a printed object and demonstrate that LOCA manages to invert the deformation without using any assumptions on the object’s structure. Finally, using Wi-Fi generated signals from multiple locations, we show that the LOCA embedding can be quantititatively correlated with the true locations of the received signals.

Our method relies on the “bursts” measurement model. As shown in Lemma 1, the covariances of the bursts can be used to estimate the Jacobian of the unknown measurement function. Alternatively, we can replace this estimation with any other type of measurement strategy informative enough to estimate the local Jacobian of the measurement function.


This work was partially supported by the DARPA PAI program (Agreement No. HR00111890032, Dr. T. Senator). E.P. has been partially supported by the Blavatnik Interdisciplinary Research Center (ICRC), the Federmann Research Center (Hebrew University) and Israeli Science Foundation research grant no. 1523/16.


  • [1] S. Alam, F. D. Albareti, C. A. Prieto, F. Anders, S. F. Anderson, T. Anderton, B. H. Andrews, E. Armengaud, É. Aubourg, S. Bailey, et al. (2015) The eleventh and twelfth data releases of the sloan digital sky survey: final data from sdss-iii. The Astrophysical Journal Supplement Series 219 (1), pp. 12. Cited by: §1.
  • [2] N. Alikhani, S. Amirinanloo, V. Moghtadaiee, and S. A. Ghorashi (2017) Fast fingerprinting based indoor localization by wi-fi signals. In

    2017 7th International Conference on Computer and Knowledge Engineering (ICCKE)

    pp. 241–246. Cited by: §7.2.
  • [3] P. Baldi (2012)

    Autoencoders, unsupervised learning, and deep architectures


    Proceedings of ICML workshop on unsupervised and transfer learning

    pp. 37–49. Cited by: §1.
  • [4] R. Basri, D. Jacobs, Y. Kasten, and S. Kritchman (2019) The convergence rate of neural networks for learned functions of different frequencies. arXiv preprint arXiv:1906.00425. Cited by: §5.
  • [5] M. Belkin and P. Niyogi (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15 (6), pp. 1373–1396. Cited by: §1.
  • [6] V. F. Calderon and A. A. Berlind (2019)

    Prediction of galaxy halo masses in sdss dr7 via a machine learning approach

    Monthly Notices of the Royal Astronomical Society 490 (2), pp. 2367–2379. Cited by: §1.
  • [7] R. R. Coifman and S. Lafon (2006) Geometric harmonics: a novel tool for multiscale out-of-sample extension of empirical functions. Applied and Computational Harmonic Analysis 21 (1), pp. 31–52. Cited by: §3.
  • [8] R. R. Coifman and S. Lafon (2006) Diffusion maps. Applied and computational harmonic analysis 21 (1), pp. 5–30. Cited by: §B.1, Appendix B, §1, §3, §6.
  • [9] H. Deutsch (2002) Principal component analysis. In Derivatives and Internal Models, pp. 539–547. Cited by: §1.
  • [10] C. J. Dsilva, R. Talmon, N. Rabin, R. R. Coifman, and I. G. Kevrekidis (2013) Nonlinear intrinsic variables and state reconstruction in multiscale simulations. The Journal of chemical physics 139 (18), pp. 11B608_1. Cited by: §1, §3.
  • [11] T. Han, X. Zhao, P. Sun, and J. Pan (2018) Robust shape estimation for 3d deformable object manipulation. arXiv preprint arXiv:1809.09802. Cited by: §7.1.
  • [12] A. Jaffe and M. Wax (2014) Single-site localization via maximum discrimination multipath fingerprinting. IEEE Transactions on Signal Processing 62 (7), pp. 1718–1728. Cited by: §7.2.
  • [13] Y. Keller, R. R. Coifman, S. Lafon, and S. W. Zucker (2009) Audio-visual group recognition using diffusion maps. IEEE Transactions on Signal Processing 58 (1), pp. 403–413. Cited by: §B.1.
  • [14] F. P. Kemeth, S. W. Haugland, F. Dietrich, T. Bertalan, K. Höhlein, Q. Li, E. M. Bollt, R. Talmon, K. Krischer, and I. G. Kevrekidis (2018) An emergent space for distributed data with hidden internal order through manifold learning. IEEE Access 6, pp. 77402–77413. Cited by: §1.
  • [15] J. B. Kruskal (1964) Nonmetric multidimensional scaling: a numerical method. Psychometrika 29 (2), pp. 115–129. Cited by: §1.
  • [16] D. Kushnir, A. Haddad, and R. R. Coifman (2012) Anisotropic diffusion on sub-manifolds with application to earth structure classification. Applied and Computational Harmonic Analysis 32 (2), pp. 280–294. Cited by: §1.
  • [17] M. Li, M. Soltanolkotabi, and S. Oymak (2019) Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. arXiv preprint arXiv:1903.11680. Cited by: §5.
  • [18] O. Lindenbaum, M. Salhov, A. Yeredor, and A. Averbuch (2017) Kernel scaling for manifold learning and classification. arXiv preprint arXiv:1707.01093. Cited by: §B.1.
  • [19] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §1.
  • [20] L. McInnes, J. Healy, N. Saul, and L. Großberger (2018) UMAP: uniform manifold approximation and projection. The Journal of Open Source Software 3 (29), pp. 861. Cited by: §1.
  • [21] J. McQueen, M. Meila, and D. Joncas (2016) Nearly isometric embedding by relaxation. In Advances in Neural Information Processing Systems, pp. 2631–2639. Cited by: §3.
  • [22] D. Perraul-Joncas and M. Meilâ (2013) Non-linear dimensionality reduction: riemannian metric estimation and the problem of geometric discovery. arXiv preprint arXiv:1305.7255. Cited by: §1, §3.
  • [23] N. Rabin and R. R. Coifman (2012) Heterogeneous datasets representation and learning using diffusion maps and laplacian pyramids. In Proceedings of the 2012 SIAM International Conference on Data Mining, pp. 189–199. Cited by: §1, §3.
  • [24] S. T. Roweis and L. K. Sau (200) Nonlinear dimensionality reduction by local linear embedding. Science 290.5500, pp. 2323–2326. Cited by: §1.
  • [25] M. Salhov, O. Lindenbaum, Y. Aizenbud, A. Silberschatz, Y. Shkolnisky, and A. Averbuch (2019) Multi-view kernel consensus for data analysis. Applied and Computational Harmonic Analysis. Cited by: §1, §3.
  • [26] E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu (2017) DBSCAN revisited, revisited: why and how you should (still) use dbscan. ACM Transactions on Database Systems (TODS) 42 (3), pp. 19. Cited by: §7.1.
  • [27] A. Schwartz and R. Talmon (2019) Intrinsic isometric manifold learning with application to localization. SIAM Journal on Imaging Sciences 12 (3), pp. 1347–1391. Cited by: §1, §3.
  • [28] S. A. Shevchik, B. Meylan, G. Violakis, and K. Wasmer (2019) 3D reconstruction of cracks propagation in mechanical workpieces analyzing non-stationary acoustic mixtures. Mechanical Systems and Signal Processing 119, pp. 55–64. Cited by: §1.
  • [29] A. Singer and R. R. Coifman (2008-09)

    Non-linear independent component analysis with diffusion maps

    Applied and Computational Harmonic Analysis 25 (2), pp. 226–239. External Links: ISSN 1063-5203 Cited by: §B.2, Appendix B, §1, §1, §1, §3, §6.1, §6.1, §6.
  • [30] A. Singer, R. Erban, I. G. Kevrekidis, and R. R. Coifman (2009) Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. Proceedings of the National Academy of Sciences 106 (38), pp. 16090–16095. Cited by: §B.1, §1, §1.
  • [31] R. Swinbank and R. James Purser (2006) Fibonacci grids: a novel approach to global modelling. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography 132 (619), pp. 1769–1793. Cited by: §6.4.
  • [32] R. Talmon, I. Cohen, and S. Gannot (2011) Supervised source localization using diffusion kernels. In 2011 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), pp. 245–248. Cited by: §1, §3.
  • [33] R. Talmon and R. R. Coifman (2012) Differential stochastic sensing: intrinsic modeling of random time series with applications to nonlinear tracking. Proc. Nat. Acad. Sci. Cited by: §1.
  • [34] R. Talmon and R. R. Coifman (2013) Empirical intrinsic geometry for nonlinear modeling and time series filtering. Proceedings of the National Academy of Sciences 110 (31), pp. 12535–12540. Cited by: §1, §3.
  • [35] J. B. Tenenbaum, V. De Silva, and J. C. Langford (2000) A global geometric framework for nonlinear dimensionality reduction. science 290 (5500), pp. 2319–2323. Cited by: §3.
  • [36] J. B. Tenenbaum, V. de Silva, and J. C. Langford (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290 (5500), pp. 2319–2323. External Links: Document, ISSN 1095-9203 Cited by: §1.
  • [37] X. Wang, H. Mao, H. Hu, and Z. Zhang (2013) Crack localization in hydraulic turbine blades based on kernel independent component analysis and wavelet neural network. International Journal of Computational Intelligence Systems 6 (6), pp. 1116–1124. Cited by: §1.
  • [38] H. Wu, R. Talmon, and Y. Lo (2014) Assess sleep stage by modern signal processing techniques. IEEE Transactions on Biomedical Engineering 62 (4), pp. 1159–1168. Cited by: §1.
  • [39] L. Zelnik-Manor and P. Perona (2005)

    Self-tuning spectral clustering

    In Advances in neural information processing systems, pp. 1601–1608. Cited by: §B.1.

Appendix A Proof of Lemma

Proof of lemma 1.

For a sufficiently small and any such that we can express by

here we use the smoothness of . Hence, we can express the covariance of the random variable by


Appendix B Methods Description

Here we provide a description and implementation details for Diffusion maps (DM) [8] and Anisotropic Diffusion maps (A-DM) [29].

b.1 Diffusion maps

Diffusion maps (DM) [8]

is a kernel based method for non linear dimensionality reduction. The method relies on a stochastic matrix built using a kernel

. The stochastic matrix can be viewed as a fictitious random walk on the graph of the data. The reduced representation is obtained via an eigendecomposition of the stochastic matrix. The DM construction is summarized in the following steps:

  1. Define a kernel function , such that with elements , where is symmetric, positive semi-definite and non-negative. Here, we focus on the common Radial Basis kernel defined for any and as


    where is a kernel bandwidth (see more details below).

  2. Row normalize


    where the diagonal matrix is defined as .

    can be interpreted as the matrix of transition probabilities of a Markov chain on

    , such that (where is an integer power) describes the implied probability of transition from point to point in steps.

  3. Define the embedding for the dataset by


where and are the -th eigenvalue and right eigenvector of the matrix .

It is important to properly tune the kernel scale/bandwidth , which determines the scale of connectivity of the kernel . Several studies have suggested methods for optimizing in DM (e.g. [30, 13, 39, 18]). Here, we use the max-min approach initially suggested in [13] where the scale is set to


The max-min aims for a scale that ensures that all points are connected to at least one other point.

b.2 Anisotropic Diffusion maps

The Anisotropic Diffusion maps (A-DM) proposed in [29] effectively replace the Euclidean distance in (12) by a (joint) local Mahalanobis distance. This local Mahalanobis distance between observed anchor points and is computed using the observed “short bursts” by


where is the generalized inverse of the sample covariance of the -th observation burst. This joint local Mahalanobis distance is used to compute the Anisotropic diffusion kernel


the scaling parameter is again optimized using (15) but now based on the Mahalanobis metric. Next, we follow steps (2) and (3) in DM (see Section B.1) and compute the A-DM embedding using the right eigenvectors of the normalized kernel.

Appendix C Numerical Experiment Setting.

Here we describe the implementation details of each numerical experiment undertaken. The architectures we used are based on fully connected layers, with an activation function after each layer, except for the last two layers.

  • First experiment details (3)

    • Neural net architecture-

      Encoder- Neurons in each layer: [50,50,2,2], Activation function: tanh.

      Decoder- Neurons in each layer: [50,50,2,2], Activation function: tanh.

    • Data- amount clouds: , cloud size: , noise standard noise: .

    • Neural net training- batch size: 200 clouds, amount training clouds: 1800, amount validation clouds: 200.

  • Second and third experiment details (4,5)

    • Neural net architecture-
      Encoder- Neurons in each layer: [50,50,2,2], Activation function: tanh.
      Decoder- Neurons in each layer: [50,50,2,2], Activation function: tanh.

    • Data- amount clouds: , cloud size: , noise standard noise: .

    • Neural net training- batch size: 200 clouds, amount training clouds: 1800, amount validation clouds: 200.

  • Forth experiment details (6)

    • Neural net architecture-
      Encoder- Neurons in each layer: [100,100,3,3], Activation function: tanh.

      Decoder- Neurons in each layer: [100,100,2,2], Activation function: leaky relu.

    • Data- amount clouds: , cloud size: , noise standard noise: .

      The points on the sphere were generated using the ”Fibonacci Sphere” mechanism with points. The points that did not satisfy constraints were left out.

    • Neural net training- batch size: clouds, amount training clouds: , amount validation clouds: .

    • Generation of a cloud around a given point on the unit sphere- Let be its polar representation. We sample a cloud of points in the polar space using . Next, we find some orthogonal transformation that maps to in the -dimensional Cartesian space, and apply it to each sampled point.

  • Flattening a curved surface experiment details(7.1)

    • Neural net architecture-
      Encoder- Neurons in each layer: [200,200,2,2], Activation function: tanh.
      Decoder- Neurons in each layer: [200,200,2,2], Activation function: leaky relu.

    • Data- amount clouds: , cloud size: .

    • Neural net training- batch size: clouds, amount training clouds: , amount validation clouds:

  • Wi-Fi localization experiment details (7.2)

    • Neural net architecture-
      Encoder- Neurons in each layer: [200,200,3,3], Activation function: tanh.
      Decoder- Neurons in each layer: [200,200,2,2], Activation function: leaky relu.

    • Data- amount clouds: , cloud size: .

    • Neural net training- batch size: clouds, amount training clouds: , amount validation clouds: .

    • Generation of a cloud around a given point - sample the circle every starting at .

The LOCA model was trained using an ADAM optimizer, that minimized at each epoch one of the two loss (

6),(7). It was trained using of the given clouds, while the rest was defined as a validation set. The early stopping mechanism was implemented by evaluating the sum of the two losses every epochs. It saves the minimal value and the weights of the model that achieved this loss. The neural net terminates its training when the minimal loss was not changed in the last epochs and loads the saved weights. By following this description we trained LOCA with the sequence of learning rates , then we fine tuned it with learning , and finished by training it with the learning rate .

Figure 9: Evaluating the isometric property of the proposed embedding (as described in Section 6.1). LABEL:- The latent representation of the data. LABEL:- The scaled calibrated embedding of DM. LABEL: The scaled calibrated embedding of A-DM. LABEL: The calibrated embedding of LOCA. The color in both figures correspond to the values of of the data.
Figure 10: Evaluating three dimensional embedding of the stereographic projection (described in (10)). We compare the latent representation of the data LABEL: with its embedding based of DM LABEL:, A-DM LABEL: and LOCA LABEL:. The colors used in LABEL:,LABEL: and LABEL: is defined in (10)
Figure 11: A demonstration of the minimal embedding dimension estimation (), based on the procedure described in Section 6.4. 11(a): Loss values on the validation set for embedding dimensions in the range . 11(b): Plot of suggested quantity for estimating the minimal embedding dimension. This plot suggests that coordinates are sufficient to represent the data using LOCA.