A new algorithm for the ^KDMDGP subclass of Distance Geometry Problems

09/11/2020 ∙ by Douglas S. Goncalves, et al. ∙ University of Campinas UFC informa 0

The fundamental inverse problem in distance geometry is the one of finding positions from inter-point distances. The Discretizable Molecular Distance Geometry Problem (DMDGP) is a subclass of the Distance Geometry Problem (DGP) whose search space can be discretized and represented by a binary tree, which can be explored by a Branch-and-Prune (BP) algorithm. It turns out that this combinatorial search space possesses many interesting symmetry properties that were studied in the last decade. In this paper, we present a new algorithm for this subclass of the DGP, which exploits DMDGP symmetries more effectively than its predecessors. Computational results show that the speedup, with respect to the classic BP algorithm, is considerable for sparse DMDGP instances related to protein conformation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Given a simple, undirected, weighted graph , with weight function and an integer , the Distance Geometry Problem (DGP) consists in finding a realization such that,

(1)

where denotes the Euclidean norm, and . Each equation in (1) is called a distance constraint. We say that a realization satisfies if the corresponding distance constraint is verified. A realization satisfying all distance constraints in (1) is called a valid realization. We shall call a pair a DGP instance.

There are many applications of Distance Geometry, mainly related to [3, 4, 27]

. An application to Data Science can be found in

[16], and a very recent survey on this subject is [15]. An important class of the DGP arises in the context of 3D protein structure calculations (), with distance information provided by Nuclear Magnetic Resonance (NMR) experiments [6, 24, 31].

Existence and uniqueness of DGP solutions, among other theoretical aspects of the problem, are discussed in [19]. Henceforth, we will consider that the DGP admits a solution.

Assumption 1.1.

The solution set of (1) is non-empty.

The DGP is naturally cast as a search in continuous space. Depending on the graph structure, however, combinatorial search algorithms can be defined, notably via the identification of appropriate vertex orders [5, 11, 14]. Although DGP is NP-hard [30], these combinatorial approaches allowed to show that it is Fixed Parameter Tractable (FPT) on certain graph structures, as those arising in protein conformation [20].

The aforementioned vertex orders define a DGP subclass, called the Discretizable Molecular Distance Geometry Problem (DMDGP) [12, 13], formally given as follows.

Definition 1.1.

A DGP instance is a DMDGP if there is a vertex order , such that

  1. is a clique;

    1. For every , is adjacent to ,

    2. .

In the above definition, denotes the induced subgraph and is the Cayley-Menger determinant of [19, Sec. 2]. Its squared value is proportional to the -volume of a realization for . Condition means that the points span an affine subspace of dimension .

Although Definition 1.1 applies to any dimension , therefore covering other applications rather than molecular conformation where , the term “molecular” is commonly kept in the related literature [12, 19, 5], regardless of the dimension, to enforce the property that the adjacent predecessors of are contiguous (the term “contiguous -lateration order” to mean DMDGP is used in [5]), a desirable property when ordering atoms of a protein [12, 11].

When the dimension is clear from the context, we shall simply use DMDGP rather than DMDGP. Moreover, without loss of generality, whenever we denote an edge by , we will assume that , i.e  precedes in the vertex order of Definition 1.1.

Properties 1 and 2(a) of Definition 1.1 says that is composed by a chain of contiguous -cliques. Moreover, properties 1 and 2 allow us to turn the search space into a binary tree, in the following way.

After fixing the positions for the first vertices, for each new vertex , with , property 2(a) ensures that the possible positions for vertex lie in the intersection of spheres centered at with radii , respectively. Property 2(b) guarantees that there are at most two points, let us say , in such intersection [23]. This spheres intersection can be computed in many different ways that we will not cover in this paper but are well studied in the literature [1, 23].

Remark 1.1.

The above process is known in the literature as -lateration [19].

Thus, following the vertex order, after fixing the first vertices, each new vertex has at most 2 possible positions, which of course depend on the position of its immediate adjacent predecessors, leading to a binary tree of possible positions, where each path, from the root to a leaf node, corresponds to a possible realization for the graph .

However, not all of these possible realizations (paths on the tree) are valid, because may contain other edges , with , associated to distance constraints that are not satisfied by such realizations. The edges given in Definition 1.1 are called discretization edges and the others, that may be (or not) available, are called pruning edges.

Henceforth, let us partition , where is the set of discretization edges and the set of pruning edges. Clearly, we can also partition the equations in (1) in discretization edge constraints and pruning edge constraints. We remark that, according to Definition 1.1, and therefore .

The Branch-and-Prune (BP) algorithm [18] explores the DMDGP binary tree in a depth first manner and validates possible positions for vertices as soon as a pruning edge appears. A pseudo-code is given in Algorithm 1.

In Algorithm 1, the phrase “ is feasible” means that the equations

are satisfied up to a certain tolerance. In Step 5 positions and are computed via -lateration. See [19, 1, 23] for details.

1:  BP # ()
2:  if (then
3:      return  
4:  else
5:      Find solutions for the system:
6:      if  is feasible then
7:          Set and call BP. # 1 candidate position
8:      end if
9:      if  is feasible then
10:          Set and call BP. # 2 candidate position
11:      end if
12:  end if
Algorithm 1 BP

Computational experiments in [12] showed that BP outperforms methods based on global continuation [25] and semidefinite programming [10] on instances of the DMDGP subclass, suggesting BP as the method of choice for this subclass of DGPs.

In addition to the discretization of the DGP search space, the DMDGP order also implies symmetry properties of such discrete space [21, 17, 29]. From the computational point of view, one of the most important of such properties, in the context of this paper, is that all DMDGP solutions can be determined from just one solution. This property is related to the DMDGP symmetry vertices, which can be identified a priori

, based on the input graph (see next section). Once a first solution is found, the others can be obtained by partial reflections of the first, based on symmetry hyperplanes associated to these vertices.

Previous works [26, 20] exploited symmetry to reconstruct all valid realizations from the first one found and to prove that the BP algorithm is fixed-parameter tractable. Others [9, 8], considered decomposition-based variants of BP which leverage DMDGP symmetry information.

In this work, we exploit DMDGP symmetry in order to find the first valid realization more quickly. We handle the DMDGP as a sequence of nested subproblems, each one defined by a pruning edge . For each subproblem, we can exploit any realization (valid or not) for building the symmetry hyperplanes (which will define partial reflections). Once we have them, we apply compositions of such partial reflections only to to find its correct position. Only after finding the correct combination of partial reflections do we use it to obtain the positions of other vertices. After a subproblem is solved, the set of valid partial reflections is reduced and a single symmetry hyperplane is enough to handle positions in the next subproblem.

In terms of the system of nonlinear equations (1), we solve a subset of equations and then gradually include new equations to this subset: the new equations are solved subject to the original equations in the subset. This process is repeated until all equations in (1) are satisfied.

These ideas lead to a new algorithm which deals with pruning edges, one-by-one, and takes advantage of a valid realization for already solved subproblems. Computational results illustrate the advantage of the new algorithm, compared with the classic BP.

This paper is organized as follows. Section 2 briefly reviews the main results about DMDGP symmetries and Section 3 explains how they can be used to solve a sequence of nested subproblems. The new algorithm, its correctness and implementation details are presented in Section 4, and comparisons with the classic BP in protein-like instances are given in Section 5. Concluding remarks are given in Section 6.

2 DMDGP symmetries

Before discussing the new algorithm, we shall present a theoretical background on DMDGP symmetries and recall some results from [20, 22, 19].

Given a realization satisfying (1), it is clear that there are uncountably many others, which satisfy the same set of distances, and which can be obtained by translations, rotations or reflections of (because these transformations preserve all pairwise distances). Since the assumptions of Definition 1.1 ensure that the first vertices form a clique, a valid realization for in can be found by matrix decomposition methods [7] or a sequence of spheres intersections [1], for example. Once the positions of these first

vertices are fixed, the degrees of freedom of translations and rotations are removed.

From here on, we say that two realizations are incongruent (modulo translations and rotations) if they are not translations, rotations or total reflections of each other. For technical reasons, we only allow the total reflections through the hyperplane defined by the positions of the first vertices in the vertex order (so two realizations, one of which is a reflection of the other through this hyperplane, will both be considered members of any set of “incongruent realizations”).

Definition 2.1.

Let be the set of all incongruent realizations satisfying distance constraints associated to discretization edges in , i.e., such that . A realization is called a possible realization.

As discussed in Section 1, each corresponds to a path from the root to a leaf node in the binary tree of a DMDGP instance. Notice that .

Definition 2.2.

A realization is said to be valid if is a solution of (1). Let denote the set of all incongruent valid realizations of a DMDGP instance.

Figure 1: The leftmost path/realization is represented by a straight line whereas the rightmost by a dashed line. All 4 possible positions for the fourth vertex (denoted by and ) can be generated by and its induced reflections and . Illustration for .

The computational experiments in [12] suggested that

is always a power of 2. A conjecture was formulated and quickly disproved using some instances constructed by hand, until the conjecture was shown to be true with probability one in

[22].

Given , for , let be the reflection of through the hyperplane defined by , with normal :

assuming . Let us also define, for all and , partial reflection operators:

(2)
Remark 2.1.

Some direct but useful properties of reflections and partial reflections are in order:

  1. A reflection preserves the distance from to any point in the hyperplane defined by .

  2. The pairwise distances for are the same as those for . As a consequence of this, and the fact that , for , all pairwise distances for from are preserved in .

  3. Partial reflections preserve distances related to discretization edges , so that , for every .

  4. All realizations in can be generated from a single by the composition of partial reflection operators [19, Sec. 3.3.8].

Let us now recall one of the main results about DMDGP symmetries.

Theorem 2.1 (Theorem 3.2 in [19]).

With probability 1, for all , there is a set of real positive values such that for each , we have . Furthermore, for all such that and , for , if and only if .

In Theorem 2.1, “with probability 1” means that the set of DMDGP instances for which the statements do not hold has Lebesgue measure zero in the set of all DMDGP instances [22].

The first part of Theorem 2.1 says that, for , the possible realizations yield a set of distinct values for . Let be the subset of possible realizations that agree with in the first positions. Given a possible realization , each of these distinct values is associated to a pair of possible positions for from realizations in (see Figure 1 where possible values for and are represented by the radii of gray and, respectively, black arcs centered at ).

Since , if the distance is available, it must be a pruning distance. In view of Assumption 1.1, then for some . Let such a define the set . Now, from the second part of Theorem 2.1, we have that among the possible realizations , only those such that are feasible with respect to . If is the last vertex in the order, then only two realizations in are feasible.

For every DMDGP solution, there is another one symmetric to the hyperplane defined by the positions of the first vertices. Moreover, as a consequence of Theorem 2.1, the number of solutions doubles for every other symmetry vertex belonging to the following set [22]:

(3)

The vertex is always in , because the first vertices define a symmetry hyperplane. The other symmetry hyperplanes are given by the positions of , if , for . As mentioned in the Section 1, can be computed before solving a DMDGP instance, which implies that the number of solutions is known a priori, and given by , with probability one.

Theorem 2.2 (Theorem 3.4 in [20]).

Let be a feasible DMDGP and its set of symmetry vertices. Then, with probability 1, .

The valid realizations are incongruent modulo translations and rotations, meaning that they differ one from another only by partial reflections (or a total reflection through the first symmetry hyperplane, as explained above).

It is important to notice from (3) that the addition of new pruning edges in may reduce the number of elements (symmetry vertices) in .

A direct consequence of Theorem 2.2 is the following corollary.

Corollary 2.1.

Let be a feasible DMDGP instance where . If , then has only two incongruent solutions which are reflections of each other through the symmetry hyperplane defined by the position of the first vertices.

Proof.

If , then , which implies that the number of solutions is . If one of these solutions is , then the other is , the reflection of through the hyperplane defined by . ∎

A result that will be useful ahead is given in Proposition 2.1 and illustrated in Figure 1.

Proposition 2.1 (Lemma 4.2 in [20]).

Let , and be the normals to the hyperplanes defining and . If is not in the hyperplanes containing the origin and normal to , then .

Proposition 2.1 tells us that compositions of partial reflections that depend on more than one realization (e.g  and ) can be described in terms of reflections based on a single realization. For example, for , we have

where the last equality follows from Proposition 2.1.

Therefore, for a DMDGP, given , problem (1

) can be cast as finding a binary vector

, such that

(4)

satisfies , for all . Here, and , where . In Section 3 we shall explain how to efficiently perform the search of this binary vector taking into account DMDGP symmetry information.

To close this section, let us describe how to generate other valid realization from a given one . Let be a valid realization for . The vertices in the set determine which components of the binary vector from (4) are allowed to change in order to obtain another valid realization for . In other words, the search space for the new is reduced to

(5)
Lemma 2.1.

Let and be a valid realization for . For every , .

Proof.

Since from Eq. (4) involves only partial reflections, in view of Property 3 in Remark 2.1, , i.e .

It remains to show that does not violate distance constraints associated to pruning edges . Since the reflections are applied to positions such that , edges with are not affected. Thus, assume that . We have that , and from (4) and (2), positions are updated by reflections , for such that . Since either , i.e  is in the hyperplane associated to , or , i.e.  comes after this hyperplane, in view of Remark 2.1, Property 2, these reflections are such that . ∎

3 Nested DMDGP subproblems

Given a DMDGP instance, properties 1 and 2 of Definition 1.1 give rise to a rich symmetric structure for the corresponding DGP problem, as discussed in Section 2.

On one hand, the absence of pruning edges turns the DMDGP into a trivial problem, because any path from the root to a leaf node of the search tree corresponds to a valid realization, i.e , and all other solutions can be built by partial reflections. On the other hand, one of the most challenging DMDGP instances to solve with BP is the one where the only pruning edge is . In that case, feasibility can only be verified at a leaf node, and for a standard depth-first search (DFS), it may represent a costly backtracking process until the first valid realization is found.

Differently, given , the present proposal is to iteratively handle the pruning edge constraints following a given order on .

As mentioned in Section 2 (after Theorem 2.2), each pruning edge may reduce the set of valid partial reflection operations that can be applied to realizations of the vertices . Thus, by keeping track of valid partial reflections (or equivalently their corresponding symmetry vertices), it is possible to consistently modify a given realization satisfying a subset of distance constraints to also satisfy a new pruning edge constraint. This process is repeated until all distance constraints are satisfied.

For this, we enumerate edges in as , with , and use to mean that edge precedes in this order. We define the set of pruning edges preceding edge by

(6)

Then, we define a sequence of subproblems spanned by following the above pruning edge order.

Definition 3.1.

Let be a feasible with . Let , where , and is the restriction of to . We say that is a subproblem of spanned by pruning edge .

It is clear that is itself a DMDGP problem. Let us denote by the solution set of .

Proposition 3.1.

Let and such that and are feasible DMDGPs. If and , then .

Let and be DMDGP subproblems spanned by edges and , respectively, such that . In view of Proposition 3.1, we have .

Moreover, in this sequence of DMDGP subproblems, each time a new pruning edge is included, e.g , the set of symmetry vertices (see Eq. (3)) for may be reduced. This motivates us to define the set of necessary symmetry vertices for subproblem as:

(7)

Let be the current realization which is valid for and let . The vertices in the set determine which components of the binary vector from Eq. (4) are allowed to change in order to obtain a valid realization for . In other words, the search space for the new is reduced to

(8)
Lemma 3.1.

Let and . Let be a valid realization for . For every , .

Proof.

The proof is similar to the one of Lemma 2.1 and therefore is left in the Appendix. ∎

Remark 3.1.

From Proposition 3.1 and Lemma 3.1, if , then, given , to obtain it suffices to find such that .

Furthermore, in the following we show that there is a unique satisfying such condition. For this, let us recall a simple fact that follows from Definition 1.1.

Proposition 3.2.

If is a DMDGP instance, so is , for .

Thus, given a DMDGP instance , any subgraph induced by at least consecutive (w.r.t. the vertex order) vertices of is a DMDGP itself. Proposition 3.2 implies that each defines a DMDGP instance based on the subgraph .

Proposition 3.3.

Any DMDGP instance spanned by has only two solutions.

Proof.

It follows from Proposition 3.2 and Corollary 2.1. ∎

Proposition 3.3 says that each DMDGP instance spanned by a pruning edge has only two solutions, which are reflections of each other through the hyperplane defined by . These two solutions correspond to a particular configuration of the components . The only difference between the two is the first component . Since and the components of with or are kept fixed, we conclude that is unique.

4 New algorithm

Henceforth, we assume that subproblems spanned by pruning edges are solved following a given order in and that a realization is given.

4.1 The conceptual algorithm

First, we present a conceptual algorithm (Algorithm 2) which summarizes the ideas discussed in the previous sections.

1:  
2:  Set ,
3:  for  do
4:      
5:      if  then
6:          Find
7:          Update and
8:      end if
9:  end for
10:  return  a valid realization
Algorithm 2 SBBU

When solving subproblem , if then this subproblem has already been solved implicitly, according to the following proposition.

Proposition 4.1.

Let be a valid realization for , for all . If , then is valid for .

Proof.

If , then for every , such that . Suppose that is such that . (a) By Theorem 2.1 and Assumption 1.1, with some , for , such that . (b) Since , it follows that . Thus, by Theorem 2.1, . But (a) and (b) together contradict Assumption 1.1. Hence and the assertion follows from Lemma 3.1. ∎

Otherwise, for , in Step 6 we perform an exhaustive search to find such that . In Step 7, we update the current realization to according to Eq. (4).

Theorem 4.1.

Let be a feasible DMDGP instance. Considering exact arithmetic, Algorithm 2 finds .

Proof.

Since , due to Assumption 1.1 and Lemma 3.1, Step 6 is well-defined. From Remark 3.1 and Step 6, it follows that , for every . Thus, since for the last pruning edge , we have , i.e , after this last subproblem is solved, . ∎

4.2 A practical algorithm

In this section, based on a particular pruning edge order, we introduce a practical version of Algorithm 2 which:

  1. does not required an initial realization ;

  2. avoids the computation and storage of unnecessary reflectors ;

  3. may result in less operations in the update step (Step 7) of Algorithm 2;

  4. allows us to discuss a concrete implementation for the sets .

For this, instead of working with a full realization , which is updated through the binary vector by Eq. (4), and computing and storing reflectors based on , the idea is to grow a partial realization , where , and compute the necessary reflectors on the fly based on the current partial realization and . This way, for each subproblem , we do not compute full valid realizations but valid partial realizations , with .

Assumption 4.1.

Pruning edges , with , are sorted in increasing order of , followed by a decreasing order of .

Under this order, we can re-write the set of pruning edges preceding as

(9)
Definition 4.1.

We say that , with , is a valid partial realization for , if satisfies all distance constraints associated to edges in .

Remark 4.1.

Recall that and thanks to Assumption 4.1, there is no , with . This allows us to extend a valid partial realization for to a valid full realization for , i.e , by simply growing to using discretization distances (see Subsection 4.2.1), because no distance constraint is affected by this operation.

Assumption 4.1, along with (9), will be used in the results that follow. Using this concepts we will show in the next subsections that when dealing with subproblem :

  1. given a partial realization satisfying discretization distances and distances corresponding to pruning edges in , it can be extended to keeping feasibility of such distance constraints and new discretization constraints;

  2. it is possible to apply partial reflections to this extended partial realization in order to fulfill without violating the distance constraints considered so far.

4.2.1 Initialization of candidate positions (Growth)

In Section 4.2.2 we shall explain how to find a valid partial realization for DMDGP subproblems by composing reflections through symmetry hyperplanes and applying them to positions . This procedure assumes that candidate positions for are available when we start to solve . In this section, we describe how to initialize these positions.

From now on, we assume that initialization of candidate positions must follow the vertex order from to , meaning that if is the current subproblem, and is the last initialized position, such that , then we initialize positions from to , whereas remain unchanged. In other words, the candidate positions are grown from the current partial realization using only distance constraints associated to discretization edges. Moreover, each position is initialized only once, although it can be modified later (see Section 4.2.2) in order to satisfy a distance constraint corresponding to a pruning edge . This is formalized in Proposition 4.2.

Proposition 4.2.

Assume edges in are ordered as . Then, before solving , positions can be initialized such that

(10)

and

(11)
Proof.

We prove this by induction on the edge order. In the base case we consider spanning the first subproblem to be solved. The positions , for are initialized right away. From Definition 1.1, can be localized uniquely (up to rotations and translations) by different methods [7, 1]. Hence, . Then, by -lateration (see Remark. 1.1), there are at most two positions for for each . Notice that any partial realization is enough to build partial reflections. The correct alternative will be chosen later by the appropriate partial reflection composition which satisfies constraints defined by pruning edges (Section 4.2.2 gives more details). Thus, without loss of generality, let be the partial realization obtained by choosing , for every . Since , this partial realization satisfies (10) and (11) for .

The induction hypothesis is that (10) and (11) hold for pruning edges , i.e  is a valid partial realization for all subproblems spanned by these edges, where

In the inductive step, let us prove that (10) and (11) also hold for pruning edge spanning subproblem .

Since subproblems spanned by edges in are solved, positions are already initialized and satisfy (11), and (10) with .

If , then there is nothing left to do. Thus, suppose . Then, positions can be initialized by -lateration (see Remark 1.1 and Algorithm 3) based on discretization distances such that (10) holds. ∎

Remark 4.2.

The proof of Proposition 4.2 describes a procedure for initialization of before solving . It is important to notice that such initialization is done sequentially and depends on previously computed positions which are not recomputed in this step. Thus, after the initialization, the current partial realization continues to be valid for all already solved subproblems.

1:  
2:  if  then
3:      return   and .
4:  end if
5:  if  then
6:      Initialize as a solution of
7:      Set
8:  end if
9:  for  do
10:      Find solutions