1 Introduction
We consider the fundamental problem in Distance Geometry (DG):
Distance Geometry Problem (DGP). Given a positive integer and a simple undirected graph with an edge weight function , establish whether there exists a realization of the vertices such that Eq. (1) below is satisfied:
(1) where for each and is the weight on edge .
Although the DGP is given above in the canonical decision form, we consider the corresponding search problem, where one has to actually find the realization . The DGP is also known as the graph realization problem in geometric rigidity [25, 6, 17]. It belongs to a more general class of metric completion and embedding problems [7, 21, 51].
In its most general form, the DGP might be parametrized over any norm [11]. In practice, the norm is the most usual choice [40], and will also be employed in this paper. The DGP with the norm is sometimes called the Euclidean DGP (EDGP). For the EDGP, Eq. (1) is often reformulated to:
(2) 
which is a system of quadratic polynomial equations with no linear terms [35, §2.4].
The EDGP is motivated by many scientific and technological applications. The clock synchronization problem, for example, aims at establishing the absolute time of a set of clocks when only the time difference between subsets of clocks can be exchanged [53]
. The sensor network localization problem aims at finding the positions of moving wireless sensor on a 2D manifold given an estimation of some of the pairwise Euclidean distances
[17, 2, 15]. The Molecular DGP (MDGP) aims at finding the positions of atoms in a protein, given some of the pairwise Euclidean distances [27, 29, 40, 35, 8, 46]. The position of autonomous underwater vehicles cannot be determined via GPS (since the GPS signal does not reach under water), but must rely on distances estimated using sonars: a DGP can then be solved in order to localize the fleet [3]. Applications of the DGP to data science are described in
[32]; see [31]for an application to natural language processing. In general, the DGP is an inverse problem which occurs every time one can measure some of the pairwise distances in a set of entities, and needs to establish their position.
The DGP is weakly NPhard even when restricted to simple cycle graphs (by reduction from Partition) and strongly NPhard even when restricted to integer edge weights in in general graphs (by reduction from 3sat) [50]. It is in NP if but not known to be in NP if for general graphs [4], which is an interesting open question [36].
There are many approaches to solving the DGP. Generally speaking, applicationspecific solution algorithms exploit some of the graph structure, whenever it is induced by the application. For example, a condition often asked when reconstructing the positions of sensor networks is that the realization should be unique (as one would not know how to choose between multiple realizations), a condition called global rigidity [10]. This condition can, at least generically, be ensured by a specific graph rigidity structure of the unweighted input graph. For protein structures, on the other hand, which are found in nature in several isomers, one is sometimes interested in finding all (incongruent) realizations of the given protein graph [28, 48, 37]. Since such graphs are rigid, one can devise an algorithm (called BranchandPrune) which, following a given vertex order, branches on reflections of the position of the next vertex, which is computed using trilateration [38, 35]. In absence of any information on the graph structure, however, one can resort to Mathematical Programming (MP) formulations and corresponding solvers [41, 12, 14].
The MP formulation which is most often used reformulates Eq. (2) to the minimization of the sum of squared error terms:
(3) 
This formulation describes an unconstrained polynomial minimization problem. The polynomial in question has degree 4, is always nonnegative, and generally nonconvex and multimodal. The decision variables are represented by a rectangular matrix such that is the
th component of the vector
, which gives the position in of vertex . Each solution having global minimum value equal to zero is a realization of the given graph. Solutions with small objective function value represent approximate solutions. Because of the nonconvexity of the formulation and the hardness of the problem, Eq. (3) is not usually solved to guaranteed optimality (e.g. using a spatial BranchandBound approach [5]); rather, heuristic approaches, such as MultiStart (MS)
[34, 26], Variable Neighbourhood Search (VNS) [39], or relaxationbased heuristics [14, 43] may be used.As far as we know, all existing MP formulations for the EDGP are based on the incidence of edges and vertices. In this paper we discuss a new MP formulation for the EDGP based on the incidence of cycles and edges instead, a relaxation based on Eulerian cycles, and a computational comparison with Eq. (3).
2 Some existing MP formulations
In this short section we give a minimal list of typical variants of Eq. (3) in order to motivate the claim that the cyclebased formulation of the DGP discussed in this paper is new. Of course, only a complete enumeration of DGP formulations in the literature could substantiate this claim. But even this short list shows that the typical modelling approach for the DGP is direct: namely, decision variables encode the realization of each vertex as a vector in . Many more formulations of the DGP and its variants, all corresponding to this criterion, are given in [26, 41, 12].
The closest variant of Eq. (3
) simply adds a constraint ensuring that the centroid of all of the points in the realization is at the origin. This removes the degrees of freedom given by translations:
(4) 
This formulation describes a linearly constrained polynomial minimization problem. Like Eq. (3), the polynomial in Eq. (4) has degree 4, is always nonnegative, and is generally nonconvex and multimodal.
Another small variant of Eq. (4) is achieved by adding range bounds to the the realization variables ; generally valid (but slack) bound values can be set to . This corresponds to the worst case of a single path being arranged in a straight line with unknown orientation.
Another possible formulation, derived again from Eq. (3), is obtained by replacing the squared error with absolute value errors (whose positive and negative parts are encoded by ). This yields the following formulation:
(5) 
Note that, again, each solution with zero optimal objective value makes an encoding of a realization of the given graph. Thus, global optima are preserved by this reformulation, while local optimal may differ.
Yet another reformulation derived from replacing squared errors with absolute values consists in observing that the “plus” and “minus” parts of each absolute value term correspond to a convex and concave function. This yields a formulation called pushandpull, since the objective pulls adjacent vertices apart, while the constraint push them back together:
(6) 
Eq. (6) is a Quadratically Constrained Quadratic Program with concave objective and convex constraints. It was used within a Multiplicative Weights Update algorithm for the DGP in [12, 47], as well as a basis for Semidefinite Programming and Diagonally Dominant Programming relaxations [14, 43]. It can be shown that all constraints are active at global optima, which therefore correspond to realizations of the given graph [47].
3 A new formulation based on cycles
In this section we propose a new formulation for the EDGP, based on the fact that the quantities sum up to zero over all edges of any cycle in the given graph for each dimensional index . This idea was used in [50] for proving weak NPhardness of the DGP on cycle graphs. For a subgraph of a graph , we use and to denote vertex and edge set of explicitly; given a set of edges we use to denote the set of incident vertices. Let and . For a mapping we denote by the restriction of to a subset .
3.1 Lemma
Given an integer , a simple undirected weighted graph and a mapping , then for each cycle in , each orientation of the edges in given by a closed trail in the cycle, and each we have:
(7) 
Proof.
We renumber the vertices in to following the walk order in . Then Eq. (7) can be explicitly written as:
as claimed. ∎∎
We introduce new decision variables replacing the terms for each and . Eq. (2) then becomes:
(8) 
We remark that for the DGP with other norms this constraint changes. For the or norms, for example, we would have:
(9) 
Next, we adjoin the constraints on cycles:
(10) 
We also note that the feasible value of a variable is the (oriented) length of the segment representing the edge projected on the th coordinate. We can therefore infer bounds for as follows:
(11) 
Although Eq. (11) are not necessary to solve the cycle formulation, they may improve performance of spatial BranchandBound (sBB) algorithms [54, 5] as well as of various “matheuristics” [42], as well as allow an exact linearization of variable products, should a
variable occur in a product with a binary variable in some DGP variant.
3.2 Theorem
The proof argues by recursion on a graph decomposition of that a certain linear system related to the cycles of (see Eq. (12) below) has a solution if and only if the given DGP instance is YES. We shall construct the proof by steps. The first step defines the graph decomposition.
Given a graph and a subset , the subgraph induced by is the graph . With a slight abuse of notation we denote the vertices of a graph by and its edges by . We let be the number of connected components of . A vertex of with the property that is called a cut vertex. A graph is biconnected if there is a simple cycle in incident to any pair of distinct vertices of .
3.3 Lemma
is biconnected if and only if it is connected and has no cut vertices.
Proof.
Suppose is biconnected with a cut vertex : then the removal of from yields two separate connected components . Let and . Since and is biconnected, there is a simple cycle in incident to , consisting of two vertexdisjoint simple paths and from to . Since the removal of can break at most one of these paths (by vertex disjointness), the other path shows that are not disconnected, against the assumption. So cannot have cut vertices. Conversely, if is connected and has a cut vertex , then any pair of paths from to must necessarily pass through , which means that they are not vertex disjoint, which implies that there is no cycle between and in , which in turn implies that is not biconnected. ∎∎
We now define a graph decomposition based on removal of a single cut vertex.
3.4 Definition
A decomposition of a graph is a set of subgraphs (where with ) of such that:

is either biconnected or a tree for all ;

;

for any the intersection either has zero cardinality or it consists of a single cut vertex of .
A decomposition of is nontrivial if . A graph is decomposable if it has a nontrivial decomposition.
3.5 Lemma
A connected graph is decomposable if and only if it has a cut vertex.
Proof.
3.6 Corollary
No simple graph consisting of a single cycle is decomposable.
Proof.
Since a cycle is biconnected, by Lemma 3.3 it cannot have a cut vertex, hence its only possible decomposition is trivial. ∎∎
3.7 Corollary
Let be decomposable, with decomposition , and be a cycle in . Then there is an index s.t. is a subgraph of .
Proof.
Consider there were two subgraphs in both incident to the edges of . Then there is a nontrivial path in , with at least two edges, joining a vertex in to a vertex in . Therefore there must be a cut vertex of on , which implies that there is a cut vertex in , which is impossible by Cor. 3.6. ∎∎
3.8 Corollary
No biconnected graph is decomposable.
Proof.
By Lemma 3.3, if is biconnected it cannot have a cut vertex, therefore any decomposition must necessarily be trivial. ∎∎
3.9 Proposition
Any simple graph has a decomposition consisting of biconnected subgraphs and tree subgraphs.
Proof.
We prove this result by induction on the number of biconnected subgraphs in a decomposition of for some . We first deal with the base case, where . We claim that must be a tree: supposing has a cycle , by Cor. 3.6 and part (c) of Defn. 3.4, must be one of the . But then against the assumption. Therefore, the trivial decomposition is a valid decomposition of . We now tackle the induction step. Consider the largest biconnected subgraph of : then has one fewer biconnected components than , so, by induction, has a decomposition for some with . We prove that is a valid decomposition of . Condition (a) is verified since is a valid 1decomposition by induction, and B is biconnected; condition (b) is verified since the union of the graph in is by construction; for condition (c), suppose there is s.t. : this means there are two distinct vertices in both and . Since is connected, there must be a path from to in , hence is a biconnected graph larger than . But was assumed to be largest, so this is not possible, and (c) holds, which concludes the proof. ∎∎
The second step proves the easier () direction of Thm. 3.2.
3.10 Proposition
Proof.
Assume that is a YES instance of the EDGP. Then has a realization in . We define for all and . Since is a realization of , by definition it satisfies Eq. (2), and, by substitution, Eq. (8). Moreover, any realization of satisfies Eq. (7) over each cycle by Lemma 3.1. Hence, by replacement, it also satisfies Eq. (10). ∎∎
In the third step, we lay the groundwork towards the more difficult () direction of Thm. 3.2. We proceed by contradiction: we assume that is a NO instance of the EDGP, and suppose that Eq. (8) and (10) have a nonempty feasible set . For every we consider the linear systems
(12) 
for each , each with variables and equations. We square both sides then sum over to obtain
(13) 
By Eq. (8) we have
(14) 
whence follows Eq. (2), contradicting the assumption that the EDGP is NO. So we need only show that there is a solution to Eq. (12) for any given . To this effect, we shall exploit the decomposition of into biconnected graphs and trees derived in Prop. 3.9. First, though, we have to show that Eq. (12) has a solution if in the “base cases” of the decomposition, namely trees and biconnected graphs.
Proof.
Let be the matrix of each system Eq. (12), for ; we aim at proving that and have the same rank, where , and that this rank is full. We proceed by induction on the size of the tree. The base case, where and consists of a single edge , yields with rank 1 for each . By inspection, also has rank 1 for any . Consider a tree with one fewer edge (say, ) than , such that . Let the corresponding system Eq. (12) satisfy , for all . Then the shape of is:
where . This shows that , that this rank is full, and hence also that , as claimed. ∎∎
3.12 Lemma
Proof.
We proceed by induction on the simple cycles of . For the base case, we consider to be a graph consisting of a single cycle, with corresponding satisfying Eq. (8) and (10). Since is a cycle, it has the same number of vertices and edges, say . This implies that, for any fixed , Eq. (12) is a linear system (where with a matrix:
(15) 
By Eq. (7) and by inspection of Eq. (15) it is clear that : then Eq. (10) ensures that , and therefore that Eq. (12) has a solution.
We now tackle the induction step. The incidence vectors in of the cycles of any graph are a vector space of dimension over the finite field [52]. We consider a fundamental cycle basis of (see Sect. 4). We assume that (a) is a union of fundamental cycles in , for which Eq. (12) has a solution by the induction hypothesis, and (b) that is another fundamental cycle in , with a solution of Eq. (12) which exists by the base case. We aim at proving that Eq. (12) has a solution for . Since is biconnected, the induction can proceed by ear decomposition [45], which means that is also biconnected, and that is such that is a nonempty path in .
By Eq. (10) applied to , we have
(16) 
Since satisfies Eq. (12) by the induction hypothesis,
(17) 
We replace Eq. (17) in Eq. (16), obtaining
(18) 
Moreover, also satisfies Eq. (12) over , hence we can replace the right hand side of Eq. (18) with the corresponding terms in to get:
(19) 
We now fix , and aim at modifying so that: (a) matches on , (b) the modified is still a solution of Eq. (12) on . We set to for each , and consider the resulting linear system Eq. (12) given by , as in Eq. (15), for each , where we assume without loss of generality that and :
(20) 
The equations from () to () in Eq. (20) are satisfied by the induction hypothesis since they only depend on , so we can remove them from the system and assume to be constant. We are left with:
(21) 
Summing up the left hand sides of Eq. (21), we obtain:
for all , so the matrix of the th linear system Eq. (21) has rank . On the other hand, eliminating the first or last row makes it clear by inspection that the rest of the rows are linearly independent; therefore the rank of is exactly . Summing up the components of the right hand side vector of Eq. (21), we obtain:
We remark that
since satisfies Eq. (12) by the induction hypothesis. Therefore
whence by Eq. (16). This implies that . Therefore, Eq. (21) has a solution, which yields the modified with properties (a) and (b) given above. This concludes the induction step and the proof. ∎∎
We can finally give the proof of Thm. 3.2.
Proof of Thm. 3.2. The () part follows by Prop. 3.10. For the () part, we exploit a decomposition of into trees and biconnected subgraphs, derive solutions to Eq. (12) for each subgraph, and show that the solutions can be easily combined to yield a solution to Eq. (12) for the whole graph .
We assume without loss of generality that is connected (otherwise each connected component can be treated separately), and consider a decomposition of . By Lemmata 3.11 and 3.12, there exist solutions to Eq. (12) applied to respectively. Consider the graph
By Cor. 3.7, is a tree: otherwise, a cycle in would be a contraction of a cycle in not included in a single , against Cor. 3.7. This allows us to reorder so that, for each , there is a unique such that .
We remark that, for each , is a realization of in by Eq. (12)(14). More precisely, is a matrix so that is the position of vertex in . Note that the realizations can be modified by translations without changing the values of (by inspection of Eq. (12)).
We now construct a solution of Eq. (12) for by induction on ordered as described above. For the base case , we fix in any way (e.g. by taking the centroid of the rows of to be the origin), and initialize the first rows of with those of . For any , we identify the unique predecessor of in the order on . The induction hypothesis ensures the existence of a solution of the union of . Consider the cut vertex in guaranteed by definition of the order on , and let be its position. Then the translation yields another valid solution of Eq. (12) applied to by translation invariance, and this solution is such that . Therefore, using the rows of , can be extended to a solution of Eq. (12) applied to the union of and , as claimed. ∎
Thm. 3.2 can also be interpreted as a polynomial reduction of the EDGP to the problem of finding a solution of Eq. (8) and (10).
Proof.
By reduction from EDGP using Thm. 3.2. ∎∎
A remarkable consequence of Thm. 3.2 is that it allows a decomposition of the computation of the realization into two stages: first, solve Eq. (8)(10) to find a feasible ; then solve
(22) 
to find a realization . We note that Eq. (22) is just a restatement of Eq. (12) universally quantified over .
3.14 Corollary
Proof.
The first stage is hard by Cor. 3.13, while the second stage is tractable, since solving linear systems can be done in polynomial time.
3.15 Remark
Note that Eq. (22) has equations, but its rank may be lower, since there are only variables: in particular, Eq. (22) may be an overdetermined linear system. The feasibility of this system is guaranteed by Cor. 3.14; in particular, the steps of the proof of Thm. 3.2 imply that Eq. (22) loses rank w.r.t. according to the incidence of the edges in the cycles of . In other words, any solution to Eq. (10) provides a right hand side to Eq. (22) that makes the system feasible.
The issue with Thm. (3.2) is that it relies on the exponentially large family of constraints Eq. (10). While this is sometimes addressed by algorithmic techniques such as row generation, we shall see in the following that it suffices to consider a polynomial set of cycles (which, moreover, can be found in polynomial time) in the quantifier of Eq. (10).
4 The cycle vector space and its bases
We recall that incidence vectors of cycles (in a Euclidean space having dimensions) form a vector space over a field , which means that every cycle can be expressed as a weighted sum of cycles in a basis. In this interpretation, a cycle in is simply a subgraph of where each vertex has even degree: we denote their set by . This means that Eq. (10) is actually quantified over a subset of , namely the simple connected cycles. Every basis has cardinality , where is the number of connected components of . If is connected, cycle bases have cardinality [52].
Our interest in introducing cycle bases is that we would like to quantify Eq. (10) polynomially rather than exponentially in the size of . Our goal is to replace “ is any simple connected cycle in ” by “ is a cycle in a cycle basis of ”. In order to show that this limited quantification is enough to imply every constraint in Eq. (10), we have to show that, for each simple connected cycle , the corresponding constraint in Eq. (10) can be obtained as a weighted sum of constraints corresponding to the basis elements.
Another feature of Eq. (10) to keep in mind is that edges are implicitly given a direction: for each cycle, the term for the undirected edge in Eq. (10) is . Note that while is exactly the same vertex set as , the corresponding term is either positive or not, depending on the direction or . We deal with this issue by arbitrarily directing the edges in to obtain a set of arcs, and considering directed cycles in the directed graph . In this interpretation, the incidence vector of a directed cycle of is a vector satisfying [24, §2, p. 201]:
(23) 
A directed circuit of is obtained by applying the edge directions from to a connected subgraph of where each vertex has degree exactly 2 (note that a directed circuit need not be strongly connected, although its undirected version is connected). Its incidence vector is defined as follows:
where we have used to mean the arcs in the subgraph . In other words, whenever we walk over an arc in the natural direction we let the th component of be ; if we walk over in the direction we assign a , and otherwise a zero.
4.1 Constraints over cycle bases
The properties of undirected and directed cycle bases have been investigated in a sequence of papers by many authors, culminating with [24]. We now prove that it suffices to quantify Eq. (10) over a directed cycle basis.
4.1 Proposition
Let be a directed cycle basis of over . Then Eq. (10) holds if and only if:
(24) 
Proof.
Necessity follows because Eq. (10) is quantified over all cycles: in particular, it follows for any undirected cycle in any undirected cycle basis. Moreover, the signs of all terms in the sum of Eq. (24) are consistent, by definition, with the arbitrary edge direction chosen for .
Next, we claim sufficiency . Let be a simple cycle, and be its directed version with the directions inherited from . Since is a cycle basis, we know that there is a coefficient vector such that:
(25) 
We now consider the expression:
(26) 
On the one hand, by Eq. (25), Eq. (26) is identically equal to for each ; on the other hand, each inner sum in Eq. (26) is equal to zero by Eq. (24). This implies for each . Since is simple and connected, is a directed circuit. This implies that . Now it suffices to replace with to obtain
where the edges on are indexed in such a way as to ensure they appear in order of consecutive adjacency. ∎∎
Obviously, if has minimum (or just small) cardinality, Eq. (24) will be sparsest (or just sparse), which is often a desirable property of linear constraints occurring in MP formulations. Hence we should attempt to find short cycle bases .
4.2 How to find directed cycle bases
We require directed cycle bases over . By [24, Thm. 2.4], each undirected cycle basis gives rise to a directed cycle basis (so it suffices to find a cycle basis of and then direct the cycles using the directions in ). Horton’s algorithm [22] and its variants [19, 44] find a minimum cost cycle basis in polynomial time. The most efficient deterministic variant is [44], and the most efficient randomized variant has the complexity of matrix multiplication. Existing approximation algorithms have marginally better complexity.
It is not clear, however, that the provably sparsest constraint system will make the DGP actually easier to solve. We therefore consider a much simpler algorithm: starting from a spanning tree, we pick the circuits that each chord (i.e., nontree) edge defines with the rest of the tree. This algorithm [49] yields a fundamental cycle basis (FCB). Finding the minimum FCB is known to be NPhard [13], but heuristics based on spanning trees prove to be very easy to implement and work reasonably well [13] (optionally, their cost can be improved by an edgeswapping phase [1, 30]).
5 The Eulerian cycle relaxation
In this section we construct a relaxation of Eq. (27) that decreases the number of constraints in Eq. (24), which occurs as the last line in Eq. (27), from to .
We let be the multigraph obtained from by adding sufficiently many parallel edges to , so that the degree of each vertex in is even. This can always be done by [16], which implies that is Eulerian, i.e. it has a cycle incident with every edge in exactly once. We let be a Eulerian cycle in , and let be either of the two orientations of obtained by walking over the cycle. We let be the digraph induced by the Eulerian circuit . For each let be the number of parallel edges between in .
We note that might have parallel and antiparallel arcs. Consider the family of arc subset of . We replace each arc having by an oriented path involving a new added vertex . Call the digraph obtained from with this replacement. We remark that is simple (it has no parallel/antiparallel arcs) by construction. Moreover, is a Eulerian digraph: take the Eulerian circuit in , and, every time it traverses a parallel/antiparallel arc with , let it traverse the oriented path replacement instead: this is clearly a Euclidean circuit in , which we call .
Next we consider the simple graph obtained by replacing each arc in with an edge. Let , and be the subset of edges of obtained from losing the orientation of the arcs in the union
of all the arcs from the path replacements. We note that, by construction,
(28) 
Let be the orientation of in w.r.t. ; let be the simple Eulerian cycle in corresponding to .
We can now prove the main result of this section.