Euclidean embedding is an important tool in data visualization and dimension reduction: using similarity measures, the goal is to embed data points in a Euclidean space such that similar points are placed close to each other while dissimilar points are widely spaced. Vector features are widely utilized in machine learning, e.g. for word embedding algorithms in natural language processing2], so the general paradigm of embedding vector data into a Euclidean space for analysis is widely used in different fields. In the family of multidimensional scaling (MDS) methods, similarities between points are explicitly represented as distances, and the target of MDS is to embed points into so that .
In general, there are two classes of applications of MDS: analysis and reconstruction. In the former, we embed data points in arbitrary domains into Euclidean space for visualization or further analysis using properties of Euclidean space ; in the latter, the data points inherently lie in Euclidean space, and we try to reconstruct their relative positions based on their pairwise distances. In this paper, we focus on the latter setting.
Reconstruction of point sets based on pairwise distances has wide applications in various fields. In many scenarios, it is difficult to directly determine coordinates of the points in a global coordinate system; the most straightforward information to acquire is pairwise distances between them, using multiple sensors in different positions. This is done in multi-point localization , and more specifically indoor multi-point localization , and GPS. The existing literature on wireless sensor networks has some practical results concerning this problem , and there is also some theoretical analysis from the perspectives of geometry and graph theory , but precise studies are lacking on the stability of reconstruction in the presence of noise.
In this paper, we propose a new criterion, noise-stability, for assessing MDS algorithms acting on (full or partial) noisy observations. It allows us to determine their ability to recover the correct global structure in the presence of independent Gaussian noise. We prove the noise-stability of the cMDS algorithm on complete graphs in generic cases, and give an algorithm to construct sparse adjacency graphs, called noise-stable rigid graphs, together with a direct and noise-stable MDS algorithm to reconstruct underlying structures of point clouds having a fixed dimensionality and bounded diameter from sparse noisy distance information, with edges and total edge length. However, consideration of degenerate cases still requires further work.
2 Related Work
2.1 Structural rigidity and min-cost rigid graphs
The theory of rigidity  comes from physics, as the property of a structure that does not deform under an applied force. In structural rigidity theory, structures are usually considered to be a collection of straight bars (line segments) connected by flexible hinges. A structure is rigid if there is no continuous flex of the shape that preserves the lengths of all segments.
Global rigidity of a graph of pairwise distance data is a necessary condition for an MDS algorithm to be able to reconstruct the positions of underlying points, as shown in ; that paper also gives an iterative algorithm to compute relative point positions from a globally rigid graph and discusses its applications in network localization.
In rigidity theory, a distance graph, also called a bar-joint framework, is a collection of points in Euclidean space that are connected by edges with fixed lengths. A distance graph is rigid in -dimensional space if and only if it has only a finite number of configurations in that satisfy the distance constraints, up to Euclidean congruence; it is also called locally rigid because the configuration is locally unique. It is globally rigid if the configuration is unique.
The Maxwell-Laman Theorem states that a 2D framework is rigid if and only if it contains a spanning Laman graph, which has edges and in which , every -vertex induced subgraph contains at most edges . In higher-dimensions, rigidity cannot be characterized in such a simple way, and indeed explicit characterization of 3D rigidity is still an open problem.
In higher dimensions, though, inductive characterizations of rigidity and global rigidity do exist using so-called 1-extensions, with which the approximate min-cost globally rigid spanning graph can be computed . However, the objective of minimizing total edge lengths makes edges highly local and the graph diameter large, rendering the reconstruction of global structure subject to cumulative errors.
2.2 MDS algorithms
The family of MDS algorithms provide different ways to embed data points into Euclidean spaces from observations of pairwise distances which can be full or partial, exact or noisy, clean or corrupted. In general, MDS for full observations can be tackled with tools of matrix analysis, and MDS for partial observations are related to sparse recovery and compressive sensing. Corrupted data are in general more problematic than missing data, and MDS with corrupted data needs techniques of robust, and usually, non-convex optimization.
The primitive cMDS algorithm  provides a way to accurately compute point cloud structures given accurate and full observations. The RobustMDS algorithm utilizes the method of convex relaxation to recover the correct structure in the presence of random corruption , while the SparseMDS approach can perform reconstruction with only partial distance observations, via matrix completion techniques [13, 14]. A further algorithm  can perform reconstruction from observations that are both sparse and corrupted.
To recover structure from an distance matrix in dimensions, the state-of-the-art SparseMDS method needs random samples while RobustMDS can tolerate at most a fraction of corruption. In general, RobustMDS is more demanding than SparseMDS because a missing value can be filled with an arbitrary value to give a corrupted value. However, although experiments in  empirically show its stability in the presence of noise, existing work has not theoretically discussed its stability in the presence of random noise in all distances.
3 Noise-Stability for MDS Algorithms and Distance Graphs
3.1 General definition
In the evaluation of performance of MDS algorithms, existing work mainly considers errors in pairwise distances between points, relative to the ground truth. However, precision in pairwise distances does not directly imply precision in global structures (e.g. see ), so we need a more trustworthy approach. Because we are considering the difference between two corresponding point clouds modulo a Euclidean congruence, we define the structural loss of a reconstructed point set with respect to the ground truth (represented as corresponding collections of column vectors) as:
in which is the -dimensional Euclidean group, is the number of points and denotes the Frobenius norm.
For a given , its Gramm matrix is
. We denote the eigenvalues ofin descending order by . Since is positive semidefinite and , we have and , .
We now define
, which represents the distribution variance ofin the
-th principal direction (corresponding to eigenvectors of). We further define as the scale-parameter of , and
In generic cases, we assume , . Then we may define the point cloud family to be
Intuitively, is a family of point clouds that have uniformly bounded domains and the same set of ‘spread widths’ in corresponding principal directions, but can have different numbers of points. We may now define the noise-stability of an MDS algorithm as follows.
Definition 1 (Noise-stability of an MDS algorithm).
For any fixed , a certain noise distribution and such that ,
an MDS algorithm is noise-stable if the output satisfies
This means that as , there exists a rigid transformation for which the root-mean-square error of corresponding points does not exceed
with high probability.
We may also define the noise-stability of distance graphs, which is a distinct concept from noise-stability of MDS algorithms:
Definition 2 (Noise-stability of distance graphs).
Let be a graph with Euclidean distances assigned to edges which is consistent with underlying point positions. If a noise-stable MDS algorithm exists which can reconstruct the point positions, we call the graph noise-stable.
Intuitively, the noise-stability of a distance graph is the property that the distance data has sufficient information to imply the spatial structure of the points, in the presence of a small amount of noise.
3.2 Independent Gaussian noise model
In this paper, we consider the independent Gaussian noise model, i.e. each pairwise distance is perturbed by an independent additive Gaussian error that follows the distribution , and we denote . We assume we perform MDS with input , in which is the set of observations. The undirected adjancency graph is denoted .
4 Noise-Stability of cMDS on Complete Graphs
In this section, we prove that for full distance observations (i.e. when is the complete graph ), the cMDS algorithm is noise-stable. This property is fundamental to the construction of noise-stable graphs in later sections of the paper.
We firstly define asymptotically negligible functions:
We define to be negligible if for any polynomial , . This is denoted by .
Theorem 1 (Noise-stability of complete graphs).
For , point cloud family and independent Gaussian noise model with variance , , we can compute a noise-stable reconstruction using the cMDS algorithm, relative to the ground truth :
where is any positive number less than 1/2; the notation means .
To prove Theorem 1, we follow the pipeline of cMDS and analyze its stability at each step. The steps in cMDS are:
Construct the squared distance matrix such that ;
Compute the Gramm matrix , in which ;
Compute the eigendecomposition such that ;
Let the output .
Essentially, the output of cMDS from the ground-truth is congruent to .
We assume a general case in which the underlying point cloud is properly aligned so that .
4.1 Estimation on noise 2-norm
Let be the underlying squared distance matrix (SDM) while be the SDM recovered by the cMDS. Then we have a concentration bound
(a bound that a random variable lies in at high probability) on the 2-norm of the noise:
Let , in which ,
be arbitrary (small) positive numbers, and . Then,
For and random variable distributed as , we have that
This implies that
There exists a constant such that
Note that , so , and . Bernstein’s inequality  states that
Lemma 3 (Bernstein).
Let be independent random variables with expectation 0. If , then ,
For and fixed -th column, because , , and ,
In (9), because , if we let , then
From the Gershgorin circle theorem :
Lemma 4 (Gershgorin circle theorem).
Let be a complex matrix with entries . For , let , and be a closed disk centered at with radius . Then, every eigenvalue of lies in at least one disk .
4.2 Error estimates for eigenvalues and eigenvectors
For symmetric matrices and , we define . Let , and in which . Let denote the Gramm matrix reconstructed from noisy observations, i.e. , and be the eigenvalues and eigenvectors of . Denote , , to be perturbations of the eigenvalues and eigenvectors. Then from the cMDS algorithm,
If , then
Because , from the sub-multiplicativity of the matrix 2-norm,
Combining this with gives:
Because , , , , we deduce:
This proves Theorem 1.
In the algorithms above, we performed cMDS on instead of . The reason is that in the cMDS algorithm entries of represent the squared distance, so even if has expectation 0, the expectation of is . Therefore, we subtract from to cancel the bias, which reduces the error.
If the 2-norm of the perturbation to is small compared to the spectral gap of , then its influence on is small.
This is because
For small (and even smaller ), as long as , it does not significantly affect the precision of . However, when is large, it will become the main part of the error in , because the effect of independent Gaussian noise diminishes with .
4.3.2 Other noise models
4.3.3 Small spectral gaps
In Eq. (1), is defined by the infimum of inconsistency between corresponding points among all alignments ; therefore, for any , we can compute the inconsistency with respect to as an upper bound of . Intuitively, choice of closer to the optimal one leads to a tighter bound.
In the derivation of Eq. (5), we used a heurestic to align corresponding principal axes (eigenvectors of ) based on the order of eigenvalues. Therefore, assuming the spectral gap of is large compared to the perturbation due to noise, the algorithm can correctly align axes with bounded error by numerical stability of the eigendecomposition. However, if eigenvalues of multiple eigenvectors have similar values (small
), then the heuristic may not find the correct alignment, resulting in failure of Theorem1.
However, even though Eq. (5) does not give a theoretical bound for it, numerical experiments show that when there are close or duplicated eigenvalues in , the reconstructed shape is still correct, even though in different orientations for different perturbations (as shown in Figure 1). This may imply that the noise-stability of cMDS is not dependent on the spectral gap of . However, a more subtle way is needed for finding (or optimizing) a near-optimal in these cases.
There exists an upper bound on for cMDS which does not depend on .
5 Construction of Noise-Stable Sparse Graphs
In Section 4, we proved the noise-stability of complete graphs. However, in applications in various fields such as wireless sensor network localization and structure design, we are also concerned about cost minimization: finding a minimal (weighted) set of edges that still satisfy certain properties.
In this section, we propose an algorithm to generate sparse graphs with noise-stability and optimized costs.
5.1 Preliminaries and motivation
5.1.1 Minimum spanning graphs
The simplest kind of minimum spanning graph is the spanning tree, which is a minimal set of edges that connects all vertices. More generally, the edges may be weighted and we desire to find a subset of edges that connects all vertices with least cost; classical minimum spanning tree algorithms solve this problem.
Structural rigidity is a stronger condition on bar-joint frameworks than connectivity: if a framework is rigid, it must be connected, otherwise different connected components can move independently from one another. Furthermore, from Definition 1 we immediately see that noise-stability implies global rigidity. In fact, global rigidity is a special case of noise-stability with . This hierarchy is shown on the right.
5.1.2 Minimum globally rigid graphs
The well-known Maxwell-Laman theorem  provides edge-counting conditions for graph rigidity:
Lemma 7 (Maxwell-Laman).
A necessary condition for -dimensional rigidity of () is that has a spanning subgraph that satisfies two conditions:
, let be the induced subgraph of on , then ,
This condition is sufficient for , but not for .
Connelly has further proved that:
Lemma 8 (Connelly).
If graph has generic global rigidity then
, is rigid. (There is redundant rigidity.)
(In this paper we assume all points are always in generic configurations, so we do not differentiate between ‘global rigidity’ and ‘generic global rigidity’). Therefore, in the generic configuration, -dimensional globally rigid graphs with vertices have at least edges.
As in MST and TSP problems, we are also interested in minimization of total edge lengths, to deal with cost optimization problems in practical applications.
Approximate min-cost global-rigid spanning graphs are discussed in , which proposes a constant-factor approximate algorithm for every fixed , whose approximation ratio is 2 and for and , respectively.
However, the graph computed in  is emphnot equipped with a localization/reconstruction algorithm, i.e. while the structure satisfying the distance constraints is unique, we cannot actually compute it efficiently. Existing matrix completion algorithms have high computational costs, and their noise-stability is not guaranteed: if every point is only connected to nearby points, accumulated errors may make the global structural error significantly larger than the local errors.
Intuitively, the min-cost condition penalizes long edges that connect distant points, but in absence of such edges, the shortest path between distant points must contain a large number of edges, whose errors will accumulate. Therefore, we consider to relax the min-cost condition, and construct a globally rigid graph with a few long edges to guarantee noise-stability, while still trying to minimize the total cost as much as possible, while still having a straightforward algorithm to reconstruct the structure from the edge lengths.
5.2 Sketch of our anchor-point scheme
The noise-stability of complete graphs guarantees that we can ‘reliably fix’ the structure of points with edges.
Assume the dimension is fixed. Since a globally rigid graph on points must have at least edges, as long as , fully connecting ‘anchor points’ makes a small overhead on the number of edges.
From this intuition, we may sketch our algorithm:
Select points distributed evenly 111It can be realized even if only distance data is given. The details will be discussed in the following part. from , denoted as ;
Connect points in pairwise with global edges , and each point in to nearby points in with local edges .
Reconstruction proceeds in two straightforward steps:
Perform cMDS on and all global edges to reconstruct ;
Compute the coordinates of points in based on the reconstructed and distance information from local edges.
In the following part, we
Use the -net tools to compute an even sample from .
Analyze the total edge length of .
Prove the global rigidity and 2D noise-stability of the graph ;
5.3 Choosing anchor points
In this algorithm, we connect each non-anchor point with nearby anchor points. To optimize the total length of local edges, we expect that has more points and they distribute evenly in , so that for each point we can find enough anchor points close to it. However, global edges are costy and their cost scale quadratically with , so the number of anchor points is another parameter to optimize. We firstly assume to be fixed and to be a constant only depending on , then look into the strategy for choice of and the optimal . (inspired by )
Farthest sampling. We are motivated to sample evenly among the points, but we only have pairwise distances as input. It can be achieved by the farthest sampling algorithm:
In this section we always assume to be constant and .
Let , be the total edge length of , then we use the farthest sampling algorithm  to compute . Let , then
To prove Theorem 3, as the total length of global edges is , now we consider the lengths of local edges. Notice that the number of local edges is , we only need to prove that all lengths of local edges are upper bounded to .
 has shown that is an -net of , with following definitions:
Point set is -sparse, if and only if , .
Point set is an -cover of , if and only if s.t. .
An -sparse -cover of is called an -net of .
For point set , define its -neighborhood as .
The bound (25) is implied by two lemmas:
From the definition of -net, we have
If is an -net of (denoted as ), then , and is a disjoint union of isolated balls.
From Proposition 1, for the -dimensional space, denote be the volume of -dimensional balls, and as the volume of . For , because ,
By the definition of , for fixed and , , so , therefore
Here let , . Because , we have proven that
If and is path-connected, then , ,
We compute the Voronoi division of w.r.t. points in , and denote the face containing as . Then,
We construct another graph s.t. and , and from (30) we deduce .
For the fixed , assume , then . Here we define be the set of points in s.t. the shortest path in between and contains at most edges, then . As the length of every edge in is at most ,
Since is path-connected, if , let , , and choose arbitrary , , then there exists a generic path that connects and . Let , then because are closed, . Therefore, s.t. , so , , .
Based on , by induction we get
5.4 Proof of global rigidity
Lemma 11 (Connelly).
, the complete graph has -dimensional global rigidity.
Lemma 12 (Connelly).
If is globally rigid, then after one of following operations, the resulting graph is globally rigid:
(Edge addition) Connect two non-adjacent vertices;
(1-extension) Choose edge , create a new vertex , choose , delete and connect each vertex in to .
Therefore, our algorithm can be regarded as: initialize with a globally rigid graph , then in each round we introduce a non-anchor point, connect it with anchor points and delete an edge in (1-extension), and re-add the deleted edge (edge addition), until all points are introduced. Because , the resulting graph has global rigidity.
5.5 2D noise-stability on non-marginal points
In section 5.3 we have only proven the existence of at least nearby anchor points close to any point, but not specified which points to connect. Intuitively, choosing nearest points minimizes the cost, but on the other hand, the configuration of connected anchor points affects the stability of localization in the presence of noise. As an example, if all connected anchors are nearly co-linear, the localization will be unstable. (As shown in Figure 3, small perturbations of lead to relatively large perturbation of the position of .)
To avoid degenerate situations, we expect the connected anchors distribute “evenly” around the point, which requires the point to lie “in the interior of” the point cloud. Rigorously, we define
For , and , we define to be interior of (w.r.t. ) if and only if is a -cover of .
Then we have
For , if is interior of w.r.t , then
and , s.t.
Since is a -cover of , for an arbitrary point , s.t. . Because is an -cover of , s.t. , thus , i.e. is an -cover of .
Then is an -cover of . Because is -sparse, , at most one point in lies in .
For any , assume the intersection points of and are , then are on the perpendicular bisector of segment . Denote the arc , then its central angle is
Because , , . As is an -cover of ,
Therefore, . This proves (33).
Let be the closest point in to , then , , .
Let be the line perpendicular to and contains , which intersects on , which do not belong to . Because
s.t. . It has been proven that , so let , , and we have proven (34).
Then we denote the midpoint of the major arc between and , and there exists s.t. , so . Therefore, for the reflection of w.r.t. line , . So we can infer the position of based on positions of and . Generally, we draw two circles centered at and with radii respectively, and choose the closer intersection point closer to , denoted as .
For the constant ,
In which is the Jacobian matrix.
We only prove and do not exceed .
We consider first. Fix , According to cosine theorem, on condition of or , from either scalar in we can uniquely determine the other, and under the constraint that and are on the same side of line , the position of can be uniquely determined.
Regard as a function of , we have
From the inverse function theorem,
Since for fixed , can only move on the circle ,
Then we consider . Fix and perturb by at most , from the triangle inequality, the perturbation of does not exceed . If now we regard it as a perturbation of , and we need to perturb to make not modified, from 44, the perturbation of does not exceed , deducing