Recognizing and realizing cactus metrics

08/05/2019 ∙ by Momoko Hayamizu, et al. ∙ University of East Anglia Delft University of Technology 0

The problem of realizing finite metric spaces in terms of weighted graphs has many applications. For example, the mathematical and computational properties of metrics that can be realized by trees have been well-studied and such research has laid the foundation of the reconstruction of phylogenetic trees from evolutionary distances. However, as trees may be too restrictive to accurately represent real-world data or phenomena, it is important to understand the relationship between more general graphs and distances. In this paper, we introduce a new type of metric called a cactus metric, that is, a metric that can be realized by a cactus graph. We show that, just as with tree metrics, a cactus metric has a unique optimal realization. In addition, we describe an algorithm that can recognize whether or not a metric is a cactus metric and, if so, compute its optimal realization in O(n^3) time, where n is the number of points in the space.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The metric realization problem, which is the problem of representing a finite metric space by a weighted graph, has many applications, most notably in the reconstruction of evolutionary trees. Although any finite metric space can be realized by a weighted complete graph, there can be different graphs that induce the same metric. In hak , Hakimi and Yau first considered “optimal” realizations of finite metric spaces, which are realizations of least total weight. Although every finite metric space has an optimal realization dress ; imrich84 , the problem of finding an optimal realization is NP-hard in general alt ; wink and the optimal solution is not necessarily unique alt ; dress .

A well-known special case of optimal realizations is provided by tree metrics, namely, those metrics that can be realized by some edge-weighted tree. For any tree metric on a finite set , its optimal realization is an -tree (i.e., a tree in which some vertices are labeled by ) and is uniquely determined hak . In addition, there exist optimal polynomial-time algorithms for computing the tree realization from a tree metric bat ; culb . However, not much is known about the properties of optimal realizations of metrics induced by graphs that are more general than trees. Developing our understanding in this direction could be useful, as trees can sometimes be too restrictive for realizing metrics arising in real-world applications huson .

In this paper, we generalize the concept of a tree metric by introducing a new type of metric called a “cactus metric111This concept was first introduced in hay .” which can be realized by an edge-weighted “-cactus”, where a cactus is a connected graph in which each edge belongs to at most one cycle. An example of an -cactus is presented in Figure 1. Note that cacti have some nice properties in common with trees. For instance, every cactus is planar and the number of vertices in an -cactus is as with -trees, which means that cactus metrics are easy to visualize. In particular, they provide a special case of an open problem in discrete geometry from Matoušek mat . Besides these observations, in this paper we prove that, just as with tree metrics, any cactus metric has a unique optimal realization. We also describe a polynomial time algorithm for deciding whether or not an arbitrary metric is a cactus metric, which also computes its optimal realization in case it is.

Figure 1: An example of an -cactus with a label-set , where the weight of each edge is proportional to its length. The vertices labeled by an element of are shown in black. The white circles are vertices that are not in .

2 Preliminaries

A metric on a set is defined to be a function with the property that equals zero if and only if the two elements in are identical, is symmetric, and satisfies the triangle inequality.

All graphs considered here are finite, connected, simple, undirected graphs in which the edges have positive weights. For any graph , and represent the vertex-set and edge-set of , respectively. For any vertex of a graph , the number of edges of that have as an endvertex is denoted by . For any graph and any subset of , we let denote the metric on induced by taking shortest paths in between elements in .

Throughout this paper, we use the symbol to represent a finite set with , which is sometimes called a label-set. For any metric on , a realization of is a graph such that is a subset of and holds for each , where we shall always assume that each vertex of with has a label in imrich84 . A realization is minimal if the removal of an arbitrary edge of yields a graph that does not realize . It is optimal if the sum of its edge weights is minimum over all possible realizations (note that optimal realizations are minimal but the converse does not hold). Any finite metric space has at least one optimal realization (imrich84, , Theorem 2.2).

We now state a theorem concerning optimal realizations which will be useful in our proofs. For a graph , each maximal biconnected subgraph of is called a block of and each vertex of shared by two or more blocks of is called a cutvertex of . Notice that if a graph consists of a single block, then it has no cutvertex.

Theorem 1 (imrich84 , Theorem 5.9)

Let be a minimal realization of a finite metric space , let be the blocks of , let be the union of the vertices of in together with the cutvertices of in , and let be the metric induced by on . Then, if every is an optimal realization of , then is also optimal. If every , besides being optimal, is also unique, then is optimal and unique too.

We now turn to two special classes of metrics, that is, tree metrics and cyclelike metrics. A metric on is called a tree metric if there exists an -tree that realizes , where an -tree is a tree with the property that each vertex of with is contained in ss13 .

Theorem 2 (hak )

If is a tree metric on a finite set , then there exists an -tree that is a unique optimal realization of .

Given a metric on with , we say that is cyclelike if there is a minimal realization for that is a cycle. This type of metric was discussed in e.g., rubei18 ; imrich84 ; unicyclic . The following result will also be useful.

Theorem 3 (imrich84 , Theorem 4.4)

Suppose is a cyclelike metric on a finite set and a cycle is a minimal realization of with , , and , where the indices are taken modulo . Then, is an optimal realization of if and only if

holds for all . In this case, is the unique optimal realization of .

3 The uniqueness of optimal realizations of cactus metrics

As mentioned above a cactus is a connected graph in which each edge belongs to at most one cycle. We define an -cactus to be a cactus with the property that each vertex of with is contained in (see Figure 1). Note that the maximum number of cycles in an -cactus is (which can be proved by induction on ). In addition, we say that a metric on a finite set is a cactus metric if there exists an edge-weighted -cactus that realizes .

Given an edge-weighted cycle that is a realization of its corresponding metric , we call a vertex slack if . The following lemma is a direct consequence of Theorem 3.

Lemma 4

Under the premise of Theorem 3, is an optimal realization of if and only if has no slack vertex.

We now use the lemma to prove the following generalization of Theorem 2, using the concept of “compactification” hak ; unicyclic ; compactification .

Theorem 5

If is a cactus metric on a finite set , then there exists an -cactus that is a unique optimal realization of .

Proof: Let be an -cactus that is a minimal realization of . Without loss of generality, we assume that each cycle of has at least four vertices (since we can always replace a 3-cycle with a tree in such a way that the obtained graph is a realization). If there is no cycle in containing a slack vertex, then the assertion immediately follows from Theorems 1, 3 and Lemma 4.

So, assume that there is a cycle in that has consecutive edges with . As we will now explain, we apply a “compactification” operation to the slack vertex (see also Figure 2). For notational convenience, let and . Compactification of refers to converting into the graph with and , where for each , the edge has weight . As can be easily verified, is an -cactus that is a minimal realization of with a strictly smaller number of slack vertices than . Thus, as is finite, by applying the same operation repeatedly and suppressing all unlabeled vertices of degree two (if any arise), we will eventually obtain an -cactus that realizes without a slack vertex, which must be the unique optimal realization of . ∎

Figure 2: An illustration of compactification that is described in the proof of Theorem 5, where we highlight each slack vertex by a square. Compactification of in the left graph yields the graph in the middle panel, which still contains a slack vertex . If we further apply the same operation to , then we obtain the graph on the right which has no slack vertex.

It is interesting to see that for cactus metrics, we do not need to perform too many “compactifications” for each cycle in the above proof in light of the following observation.

Proposition 5

If the premise of Theorem 3 holds, then has at most two slack vertices. In the case when there exist precisely two slack vertices, they are adjacent in .

Proof: Let as in Theorem 3. Suppose has at least two slack vertices and assume that is a slack vertex, in other words, that holds. As the path in from to that does not contain is the shortest path between and , it follows that any is not slack. Now, suppose is a slack vertex. Then using a similar argument by considering the shortest path between and , it follows that is not slack. So the only slack vertices are and . The same argument applies to the case when is a slack vertex. ∎

4 A polynomial time algorithm for finding the optimal cactus realization

In this section we describe an algorithm, which for a metric on , produces the unique optimal realization for that is an -cactus or a message that there is no such realization in time. This should be compared to tree metrics for which the same process can be carried out in time bat ; culb .

We begin by considering cyclelike metrics. Note that the characterization given in Theorem 3 for when a realization of a cyclelike metric is optimal is not sufficient to characterize cyclelike metrics, as pointed out in unicyclic . Even so we have the following result (which is related to Theorem 4.1 in rubei18 ):

Lemma 6

Given a metric on , we can determine if there is an edge-weighted cycle that is an optimal realization of and, if so, compute in time.

Proof: We describe an algorithm that takes an arbitrary metric on as input, which in case has an optimal realization that is a cycle computes this cycle, and stops if this is not the case:

1) Start by finding a pair of distinct elements in such that holds for any , and then set and . 2) For each , find all vertices with . Among these vertices, we let be the unique vertex that minimizes . If such a vertex does not exist, or if such a vertex does exist but it is not unique, then stop; else set and . 3) Set and . 4) Check if the cycle defined by and together with the weight of each edge is a minimal realization of . If not then stop, else output the weighted cycle .

If this algorithm returns a cycle that realizes , then satisfies the equation in Theorem 3 and so is the optimal realization of . Conversely, if there is a cycle that is an optimal realization of , then is unique. In this case, the above algorithm correctly constructs as follows. The algorithm initializes by finding two vertices of that are closest together. Since an optimal realization that is a cycle is minimal, it must be the case that these two vertices are connected by an edge. In Step 2, the algorithm iteratively extends the existing path by seeking for the neighbour of , which is one of the endvertices of the path. Observe that the two conditions in Step 2 uniquely determine this neighbour: the first condition ensures that a shortest path between and contains ; the second condition correctly identifies the neighbour of by making sure that the distance between it and is shortest. In Step 3, we join the two endvertices of the path by an edge to form the cycle . Note that in this step, we run the risk of making a realization of that is a path into a realization of that is a cycle that is not minimal. Due to this, and also to ensure we have the correct solution, we check that the cycle is a minimal realization of in Step 4.

To give the running time of the algorithm, observe that Step 1 takes time as we search for a minimum element from a set of size . In Step 2, we iterate over a ‘for loop’ at most times. Within the ‘for loop’ we iterate over at most elements to find the vertices that satisfy the first condition. Then, we iterate over those vertices to find a minimum element from at most elements. Hence, each ‘for loop’ takes time; it follows then that Step 2 takes time. Step 3 takes constant time, as we simply add a weighted edge to the graph. Since one can obtain the metric induced by a cycle in at most time, Step 4 can be performed in at most time. As each step of the algorithm can be done in time, the whole algorithm requires time. ∎

Theorem 7

Given a metric on , we can determine if is a cactus metric and if so construct its optimal realization in time.

Proof: In (HVcutpoint, , Algorithm 2) Hertz and Varone give a polynomial time algorithm for decomposing an arbitrary metric space into finite metric spaces , , with , such that any optimal realization of must consist of a single block, and such that an optimal realization for can be constructed by piecing together the optimal realizations for the . They also observe (HVcutpoint, , p.174) that this decomposition can be computed in time using results in dress10 (see also (dress10, , p.160)). In addition, by the arguments in (dress10, , Lemma 3.1), it follows that is .

Assume that we have decomposed into by using the aforementioned preprocessing algorithm. In case , its optimal realization is obviously a tree. Recalling the argument in the proof of Theorem 5, we know that holds for each . For each with , by using the algorithm in Lemma 6, we can check if has an optimal realization that is a cycle or not, and if so construct the cycle in time (and hence time suffices). If there is some such that and does not have an optimal realization that is a cycle, then is not a cactus metric, else is a cactus metric, and we can construct the cactus by piecing together the optimal realizations for the . Using the aforementioned fact that is , we conclude that the overall time complexity is . ∎

5 Discussion and future work

It may be worth investigating as to whether there is a more direct and efficient algorithm than the one given in Theorem 7 for recognizing and/or realizing cactus metrics that use structural properties of cactus graphs. More generally, we could investigate optimal realizations for metrics that can be realized by graphs in which every block satisfies , and such that every vertex in with degree at most 2 is contained in . Here, we note that in case , is an -tree, and in case , is an -cactus. However, even in case , there may be infinitely many optimal realizations (e.g. the metric given in (alt, , Fig. 15)). So it might be interesting to first understand for which of these metrics have a unique optimal realization, whether such metrics can be recognized in polynomial time, and whether there exists a polynomial time algorithm for computing some optimal realization.

Acknowledgment: Hayamizu is supported by JST PRESTO Grant Number JPMJPR16EB. Huber, Moulton and Murakami thank the Netherlands Organization for Scientific Research (NWO), including Vidi grant 639.072.602. Huber and Moulton also thank the Research Institute for Mathematical Sciences, Kyoto University, The Institute of Statistical Mathematics, Tokyo, and the London Mathematical Society for their support.

References

  • (1) I.Althöfer, On optimal realizations of finite metric spaces by graphs, Discrete and Computational Geometry 3(1) (1988) 103-122.
  • (2) A.Baldisserri, R.Elena, Distance matrices of some positive-weighted graphs, Australian Journal of Combinatorics 70(2) (2018) 185-201.
  • (3) V.Batagelj, T.Pisanski, J.M.Simões-Pereira, An algorithm for tree-realizability of distance matrices, International Journal of Computer Mathematics 34(3-4) (1990) 171-176.
  • (4) J.C.Culberson, P.Rudnicki, A fast algorithm for constructing trees from distance matrices, Information Processing Letters 30(4) (1989) 215-220.
  • (5) A.Dress, Trees, tight extensions of metric spaces, and the cohomological dimension of certain groups: a note on combinatorial properties of metric spaces, Advances in Mathematics 53(3) (1984) 321-402.
  • (6) A.Dress, K.T.Huber, J.Koolen, V.Moulton, A.Spillner, An algorithm for computing cutpoints in finite metric spaces, Journal of Classification 27(2) (2010) 158-172.
  • (7) S.L.Hakimi, S.S.Yau, Distance matrix of a graph and its realizability, Quarterly of Applied Mathematics 22(4) (1965) 305-317.
  • (8) M.Hayamizu, -cactus trees and cactus tree metrics, The 21st New Zealand Phylogenomics Meeting (Waiheke 2017), 12-17 February 2017. https://cdn.auckland.ac.nz/assets/compevol/events/
    documents/Waiheke2017Programme.pdf
  • (9) A.Hertz, S.Varone, The metric cutpoint partition problem, Journal of Classification 25(2) (2008) 159-175.
  • (10) D.H.Huson, R.Rupp, C.Scornavacca, Phylogenetic networks: concepts, algorithms and applications, Cambridge University Press, 2010.
  • (11) W.Imrich, J.M.Simoes-Pereira, C.M.Zamfirescu, On optimal embeddings of metrics in graphs, Journal of Combinatorial Theory, Series B 36(1) (1984) 1-15.
  • (12) J.Matoušek, 2.7 How large graph?, Open problems on embeddings of finite metric spaces Workshop on discrete metric spaces and their algorithmic applications, 2002, available at http://kam.mff.cuni.cz/ matousek/metrop.ps.
  • (13) C.Semple, M.Steel, Phylogenetics, Oxford University Press, 2003.
  • (14) J.M.S. Simões-Pereira, A note on distance matrices with unicyclic graph realizations, Discrete Mathematics 65(3) (1987) 277-287.
  • (15) J.M.S. Simões-Pereira, C.M. Zamfirescu, Submatrices of non-tree-realizable distance matrices, Linear Algebra and its Applications 44 (1982) 1-17.
  • (16) P.Winkler, The complexity of metric realization, SIAM Journal on Discrete Mathematics, 1(4) (1988) 552-559.