 # On the Longest Spanning Tree with Neighborhoods

We study a maximization problem for geometric network design. Given a set of n compact neighborhoods in R^d, select a point in each neighborhood, so that the longest spanning tree on these points (as vertices) has maximum length. Here we give an approximation algorithm with ratio 0.511, which represents the first, albeit small, improvement beyond 1/2. While we suspect that the problem is NP-hard already in the plane, this issue remains open.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In the Euclidean Maximum Spanning Tree Problem (EMST), given a set of points in the Euclidean space , , one seeks a tree that connects these points (as vertices) and has maximum length. The problem is easily solvable in polynomial time by Prim’s algorithm or by Kruskal’s algorithm; algorithms that take advantage of the geometry are also available . In the Longest Spanning Tree with Neighborhoods (Max-St-N), each point is replaced by a point-set, called region or neighborhood, and the tree must connect representative points, one chosen from each region (duplicate representatives are allowed), and the tree has maximum length. The tree edges are straight line segments connecting pairs of points in distinct regions; for obvious reasons we refer to these edges as bichromatic. As one would expect, the difficulty lies in choosing the representative points; once these points are selected, the problem is reduced to the graph setting and is thus easily solvable.

The input consists of (possibly disconnected) neighborhoods. For simplicity, it is assumed that each neighborhood is a union of polyhedral regions; the total vertex complexity of the input is . However, it will be apparent from the context that our methods extend to a broader class of regions, those approximable by unions of polyhedral regions within a prescribed accuracy (for instance unions of balls of arbitrary radii, etc).

### Examples.

Let , where , , , , and is a unit equilateral triangle. Selecting vertices , at , respectively yields a spanning tree in the form of a star centered at of length ; it obviously makes a longest spanning tree of the neighborhoods in . It is worth noting that a greedy algorithm does not necessarily find an optimal tree. Let , where , , , is a unit equilateral triangle and is the midpoint of ; see Figure 1 (left). Figure 1: Left: an example on which the greedy algorithm is suboptimal. Right: an example of a long (still suboptimal) spanning tree with 10 regions N={A,S∪S,E∪E∪E,T∪T,O∪O,F,N∪N,% R,G,I} (some regions are disconnected); the blue segments form a spanning tree on N and the green dots are the chosen representative points.

A (natural) greedy algorithm chooses two points attaining a maximum inter-point distance with points in distinct regions, and then repeatedly chooses a point in each new region as far as possible from some selected point. Here the selection , , yields a spanning tree in the form of a star centered at of length ; on the other hand, selecting vertices , at , respectively, yields a spanning tree in the form of a star centered at of length . Another example appears in Figure 1 (right).

We start by providing a factor approximation to Max-St-N. We then offer two refinement steps achieving a better ratio. The last refinement step proves Theorem 1.

###### Theorem 1.

Given a set of neighborhoods in (with total vertex complexity ), a ratio approximation for the maximum spanning tree for the regions in can be computed in polynomial time.

Although our improvement in the approximation ratio for spanning trees is very small, it shows that the “barrier” of can be broken. On the other hand, we show that every algorithm that always includes a bichromatic diameter pair in the solution (as the vertices of the corresponding regions) is bound to have an approximation ratio at most (via Figure 4 in Section 3).

### Definitions and notations.

A geometric graph is a graph whose vertices (a finite set) are points in the plane and whose edges consist of straight line segments. The length of , denoted , is the sum of the Euclidean lengths of all edges in .

For a neighborhood , let denote its set of vertices. Let denote the union of vertices of all neighborhoods in ; put .

Given a set of neighborhoods, we define the following parameters. A monochromatic diameter pair is a pair of points in the same region attaining a maximum distance. A bichromatic diameter pair is a pair of points from two regions attaining a maximum distance, i.e., , , where , , and is maximum. For and , let denote the maximum distance between and any point of a neighborhood . It is well known and easy to prove that both a monochromatic diameter and bichromatic diameter pair are attained by pairs of vertices in the input instance. An optimal (longest) Spanning Tree with neighborhoods is denoted by ; it is a geometric graph whose vertices are the representative points of the regions.

### Preliminaries and related work.

Computing the minimum or maximum Euclidean spanning trees of a point set are classical problems in a geometric setting [13, 14]. A broad collection of problems in geometric network design, including the classical Euclidean Traveling Salesman Problem (ETSP), can be found in the surveys [9, 11, 12]. While past research has primarily focused on minimization problems, the maximization variants usually require different techniques and so they are interesting in their own right and pose many unmet challenges; e.g., see the section devoted to longest subgraph problems in the survey of Bern and Eppstein . The results obtained in this area in the last years are rather sparse; the few articles [4, 8, 10] make a representative sample.

Spanning trees for systems of neighborhoods have also been studied. For instance, given a set of (possibly disconnected) compact neighborhoods in , select a point in each neighborhood so that the minimum spanning tree on these points has minimum length [7, 18], or maximum length , respectively. In the cycle version first studied by Arkin and Hassin , called TSP with neighborhoods (TSPN), given a set of neighborhoods in , one must find a shortest closed curve (tour) intersecting each region.

## 2 Approximation algorithms

Let , where . Given a point , the star centered at , denoted , is the spanning tree on whose edges connect to the other points. Similarly, given two points , a -star centered at , denoted , is a spanning tree on made from segment and other edges connecting or to the other points.

Using a technique developed in  (in fact a simplification of an earlier approach used in ), we first obtain a simple approximation algorithm with ratio .

### Algorithm A1.

Compute a bichromatic diameter of the point set , pick an arbitrary point (vertex) from each of the other neighborhoods, and output the longest of the two stars centered at one of the endpoints of the diameter.

### Analysis.

Let be a bichromatic diameter pair, and assume without loss of generality that is a horizontal unit segment, where and . We may assume that and ; refer to Fig. 2. Figure 2: A bichromatic diameter pair a,b and the disk ω.

The ratio (or which is slightly better) follows from the next lemma in conjunction with the obvious upper bound

 len(TOPT)≤n−1. (1)

The latter is implied by the fact that each edge of is bichromatic.

###### Lemma 1.

Let and be the stars centered at the points and , respectively. Then .

###### Proof.

Assume that , . For each , the triangle inequality for the triple gives

 |api|+|bpi|≥|ab|=1.

By summing up we have

 len(Sa)+len(Sb)=n∑i=3(|api|+|bpi|)+2|ab|≥(n−2)+2=n.\qed

We next refine this algorithm to achieve an approximation ratio of . The technique uses two parameters and , introduced below. The smallest value of the ratio obtained over the entire range of admissible and is determined and output as the approximation ratio of Algorithm A2. For simplicity, we present the algorithm for the plane i.e., ; its extension to higher dimensions is straightforward, and is briefly discussed at the end.

Let be the midpoint of , and be the disk centered at , of minimum radius, say, , containing at least of the neighborhoods ; in particular, this implies that we can consider neighborhoods as contained in and neighborhoods having points on the boundary or in the exterior of . We can assume that ; if , the result easily follows, since for each of the regions not contained in , one of the connections from an arbitrary point of the region to or is at least . Let be the spanning tree consisting of all such longer connections together with . Then

 len(T) ≥1+12⌊n2⌋+(⌈n2⌉−2)√14+x2 ≥1+√1+4x24(n−1)+1−3√1+4x24 ≥5+√2920(n−1)+1−3√2920≥5+√2920(n−1).

So the approximation ratio is at least

Let the monochromatic diameter of be , for some ; the next lemma shows that , and so the monochromatic diameter of is , for some .

For every , .

###### Proof.

Let be a diameter pair of . Let be an arbitrary point of an arbitrary neighborhood . By the triangle inequality, we have , as required. ∎

If , let be a corresponding diameter pair; Choose a point in every other region and connect it to and . Since , the longer of the two stars centered at and has length at least ; this candidate spanning tree offers thereby this ratio of approximation. We will subsequently assume that .

As shown above a constant approximation ratio better than can be obtained if or is sufficiently large. In the complementary case (both and are small), an upper bound of , for some constant , on the length of can be derived. We continue with the technical details.

### Algorithm A2.

The algorithm computes one or two candidate solutions. The first candidate solution for the spanning tree is only relevant for the range (if its length could be smaller than ). Assume that one of the regions, say, achieves a diameter pair: . Choose an arbitrary point in every other region and connect it to and . Let be the longer of the two stars centered at and . As such,

 len(T1)≥(n−1)1+y2. (2)

The second candidate solution for the spanning tree connects each of the regions contained in with either or at a cost of at least (based on the fact that ). For each region , , select the vertex of that is farthest from and connect it with or , whichever yields the longer connection. As such, if is not contained in , the connection length is at least . Finally add the unit segment . Then,

 len(T2)≥1+⌊n2⌋12+(⌈n2⌉−2)√14+x2. (3)

The above expression can be simplified as follows. If is even, (3) yields

 len(T2) ≥1+n4+(n4−1)√1+4x2 =n−14(1+√1+4x2)+(54−34√1+4x2) ≥n−14(1+√1+4x2).

If

is odd, (

3) yields

 len(T2) ≥1+n−14+(n+14−1)√1+4x2 =n−14(1+√1+4x2)+(1−24√1+4x2) ≥n−14(1+√1+4x2).

Consequently, for every we have

 len(T2)≥n−14(1+√1+4x2). (4)

### Upper bound on len(TOPT).

Let be be the disk of radius centered at , where

 R(y)=⎧⎪⎨⎪⎩√32if y≤0√32+2√3yif y≥0
###### Lemma 3.

is contained in .

###### Proof.

Assume for contradiction that there exists a point at distance larger than from . By symmetry, we may assume that and that lies in the closed halfplane above the line containing .

First consider the case ; it follows that . If , then , which contradicts the definition of ; otherwise and are points in different neighborhoods at distance larger than , in contradiction with the original assumption on the bichromatic diameter of .

Next consider the case ; it follows that . If , then , which contradicts the definition of ; otherwise and are points in different neighborhoods at distance larger than , in contradiction with the original assumption on the bichromatic diameter of .

In either case (for any ) we have reached a contradiction, and this concludes the proof. ∎

###### Lemma 4.

Let be a set of neighborhoods and be an optimal spanning tree assumed to connect points (vertices) for . For every , we have

 len(TOPT)≤∑i≠jdmax(pi).
###### Proof.

Consider rooted at . Let denote the parent of a (non-root) vertex . Uniquely assign each edge of to vertex . The inequality holds for each edge of the tree. By adding up the above inequalities, the lemma follows. ∎

###### Lemma 5.

If is contained in , and , then .

###### Proof.

By definition, . By Lemma 3, the vertex set is contained in ; equivalently, all neighborhoods in are contained in . By the triangle inequality, , as claimed. ∎

###### Lemma 6.

The following holds:

 len(TOPT)≤(n−1)⋅min(1,1+x+R(y)2). (5)
###### Proof.

Let be a longest spanning tree of , where , for . View as rooted at ; recall that . By Lemma 4,

 len(TOPT)≤n∑i=2dmax(pi).

If is not contained in , ; otherwise, by Lemma 5, . By the setting of in the definition of , we have

 len(TOPT)≤(⌈n2⌉−1)⋅1+⌊n2⌋⋅min(1,x+R(y)).

If is even, the above inequality yields

 len(TOPT) ≤(n2−1)+n2min(1,x+R(y)) =n−12(1+x+R(y))+min(1,x+R(y))−12 ≤n−12(1+x+R(y)),

while if is odd, it yields

 len(TOPT) ≤n−12+n−12(x+R(y))=n−12(1+x+R(y)).

Therefore in both cases. Then the lemma follows by adjoining the trivial upper bound in equation (1). ∎

## 3 Analysis of Algorithm A2

We start with a preliminary argument for ratio that comes with a simpler proof. We then give a sharper analysis for ratio .

### A first bound on the approximation ratio of A2.

First consider the case . Then , so the ratio of A2 is at least

 min0≤x≤0.2y<0len(T2)len(TOPT) ≥min0≤x≤0.21+√1+4x2min(4,2+√3+2x).

A standard analysis shows that this ratio achieves its minimum when .

When , the ratio of A2 is at least

 min0≤x,y≤0.2max(len(T1)len(TOPT),len(T2)len(TOPT)).

The inequalities (2), (4), (5) imply that this ratio is at least

 max(1+y,(1+√1+4x2)/2)min(2,1+x+R(y))=max(1+y,(1+√1+4x2)/2)min(2,1+√32+x+2√3y).

Since the analysis is similar to that for deriving the refined bound we give next, we state without providing details that this piecewise function reaches its minimum value when and . This provides a preliminary ratio 0.506 in Theorem 1.

### A refined bound.

Let . Assume for convenience that the regions are relabeled so that are contained in and are not contained in the interior of . Recall that are the representative points in an optimal solution . Let , for ; as such, . Let the average of be , where , i.e., .

Observe that , for . Consequently, the upper bound in (5) can be improved to

 len(TOPT)≤n−12(1+λx+R(y)). (6)

We next obtain an improved lower bound on . Recall that Algorithm A2 selects the vertex of that is farthest from for every , and connects it with or , whichever yields the longer connection. In particular, the length of this connection is at least for . Since the function is concave, Jensen’s inequality yields:

 m+2∑i=3√1+4x2i≥m√1+4λ2x2,

hence we obtain the following sharpening of the lower bound in (4):

 len(T2)≥n−14(√1+4λ2x2+√1+4x2). (7)

In order to handle (6) and (7) we make a key substitution and simplify the lower bound in (7). Recall that , and so and . We now deduce from (6) and (7) that

 len(TOPT)≤n−12(1+z+R(y)), (8)

and

 len(T2)≥n−12√1+4z2. (9)

To analyze the approximation ratio we distinguish two cases:

Case 1: . Then , so the ratio of A2 is at least

 min0≤z≤0.2max(len(T2)len(TOPT))≥min0≤z≤0.22√1+4z2min(4,2+2z+√3).

When , we have . Then

 √1+4z22≥√8−4√32=√2−√3=0.517….

When , or , let

 f(z)=2√1+4z22+√3+2z.

Then

 f′(z)=8(2+√3)z−4√1+4z2(2+√3+2z)2.

Since , the function is non-increasing on and so

 f(z)≥f(1−√3/2)=√2−√3=0.517….

This concludes the proof for the first case.

Case 2: , then the ratio of A2 is at least

 min0≤y,z≤0.2max(len(T1)len(TOPT),len(T2)len(TOPT)).

For , let

 g(z,y)=max(1+y,√1+4z2)min(2,1+z+R(y))=max(1+y,√1+4z2)min(2,1+√32+z+2√3y).

The inequalities (2), (8), (9) imply that the ratio of A2 is at least

 min0≤y,z≤0.2g(z,y).

The curve and the line split the feasible region into four subregions; see Figure 3.

The curve intersects line at point , where and Set

 ρ:=(1+y0)/2=(4√3+2−4√27)/13=0.5114… (10)

In region I, . It reaches the minimum value when is minimized, i.e., .

In region II, . Its partial derivative is positive, i.e.,

 ∂g∂y=1−√3/6+z(1+√3/2+z+2y/√3)2>0,

so reaches its minimum value on the curve . On this curve, let

 G(z)=g(z,y(z))=√1+4z21−√3/6+z+2√1+4z2/√3.

Its derivative is

 G′(z)=(4−2√3/3)z−1√1+4z2(1−√3/6+z+2√1+4z2/√3)2.

Note that the numerator of is negative, i.e., for , thus . So the minimum value is , and is achieved when is maximized, i.e., .

In region IV, which increases monotonically with respect to . So the minimum value is again and is achieved when is minimized, i.e., .

In region III,

 g(z,y)=√1+4z21+√3/2+z+2y/√3.

Its partial derivative is negative, i.e.,

 ∂g∂y=−2√1+4z2√3(1+√3/2+z+2y/√3)2<0,

so reaches its minimum value on the arc or the segment , where is the intersection point of and the -axis. Since these two curves are shared with region II and IV respectively, by previous analyses, reaches its minimum value at point .

In summary, we showed that

 min0≤y,z≤0.2g(z,y)≥ρ=0.511…,

establishing the approximation ratio in Theorem 1.

### Remark.

The algorithm can be adapted to work in for any . In the analysis, the disk becomes the ball of radius with the same defining property; the disk becomes the ball of radius . All arguments and relevant bounds still hold since they rely on the triangle inequality; the verification is left to the reader. Consequently, the approximation guarantee remains the same.

### An almost tight example.

Let be an isosceles triangle with , , for a small ; e.g., set . Let , where , , and are points at distance from , below and whose projections onto are close to the midpoint of (see Figure 4).

The spanning tree constructed by A2 is of length close to , while the longest spanning tree has length at least ; as such, the approximation ratio of A2 approaches for large . Note that this is a tight example for the case , for which the ratio of A2 is at least ; and an almost tight example in general, since the overall approximation ratio of A2 is . Moreover, the example shows that every algorithm that always includes a bichromatic diameter pair in the solution (as the vertices of the corresponding regions) is bound to have an approximation ratio at most .

### Time complexity of Algorithm A2.

It is straightforward to implement the algorithm to run in quadratic time for any fixed . All interpoint distances can be easily computed in time. Similarly the farthest point from in each region (over all regions) can all be computed in time. Subquadratic algorithms for computing the diameter and farthest bichromatic pairs in higher dimensions can be found in [1, 6, 15, 16, 17]; see also the two survey articles [9, 11].

## 4 Conclusion

We gave two approximation algorithms for Max-St-N: a very simple one with ratio and another simple one (with slightly more elaborate analysis but equally simple principles) with ratio . The following variants represent extensions of the Euclidean maximum TSP for the neighborhood setting.

In the Euclidean Maximum Traveling Salesman Problem, given a set of points in the Euclidean space , , one seeks a cycle (a.k.a. tour) that visits these points (as vertices) and has maximum length; see . In the Maximum Traveling Salesman Problem with Neighborhoods (Max-Tsp-N), each point is replaced by a point-set, called region or neighborhood, and the cycle must connect representative points, one chosen from each region (duplicate representatives are allowed), and the cycle has maximum length. Since the original variant with points is NP-hard when (as shown in ), the variant with neighborhoods is also NP-hard for . The complexity of the original problem in the plane is unsettled, although the problem is believed to be NP-hard . In the path variant, one seeks a path of maximum length.

The following problems are proposed for future study:

1. What is the computational complexity of Max-St-N?

2. What approximations can be obtained for the cycle or path variants of Max-Tsp-N?

## References

•  P. K. Agarwal, J. Matoušek and S. Suri, Farthest neighbors, maximum spanning trees and related problems in higher dimensions, Comput. Geom. 1 (1992), 189–201.
•  N. Alon, S. Rajagopalan and S. Suri, Long non-crossing configurations in the plane, Fundamenta Informaticae 22 (1995), 385–394.
•  E. M. Arkin and R. Hassin, Approximation algorithms for the geometric covering salesman problem, Discrete Appl. Math. 55 (1994), 197–218.
•  A. I. Barvinok, S. Fekete, D. S. Johnson, A. Tamir, G. J. Woeginger, and R. Woodroofe, The geometric maximum traveling salesman problem, Journal of ACM 50(5) (2003), 641–664.
•  M. Bern and D. Eppstein, Approximation algorithms for geometric problems, in Approximation Algorithms for NP-hard Problems (D. S. Hochbaum, ed.), PWS Publishing Company, Boston, MA, 1997, pp. 296–345.
•  B. K. Bhattacharya and G. T. Toussaint, Efficient algorithms for computing the maximum distance between two finite planar sets, Journal of Algorithms 4 (1983), 121–136.
•  R. Dorrigiv, R. Fraser, M. He, S. Kamali, A. Kawamura, A. López-Ortiz, and D. Seco, On minimum- and maximum-weight minimum spanning trees with neighborhoods, Theory Comput. Syst. 56(1) (2015), 220–250.
•  A. Dumitrescu and Cs. D. Tóth, Long non-crossing configurations in the plane, Discrete and Computational Geometry 44(4) (2010), 727–752.
•  D. Eppstein, Spanning trees and spanners, in J.-R. Sack and J. Urrutia (editors), Handbook of Computational Geometry, pages 425–461, Elsevier Science, Amsterdam, 2000.
•  S. Fekete, Simplicity and hardness of the maximum traveling salesman problem under geometric distances, Proc. 10th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM/SIAM, 1999, pp. 337–345.
•  J. S. B. Mitchell, Geometric shortest paths and network optimization, in J.-R. Sack and J. Urrutia (editors), Handbook of Computational Geometry, pages 633–701, Elsevier Science, Amsterdam, 2000.
•  J. S. B. Mitchell, Shortest paths and networks, in Handbook of Computational Geometry (J. E. Goodman and J. O’Rourke, eds.), Chapman & Hall/CRC, 2nd edition, 2004, pp. 607–641.
•  C. Monma, M. Paterson, S. Suri and F. Yao, Computing Euclidean maximum spanning trees, Algorithmica 5 (1990), 407–419.
•  F. P. Preparata and M. I. Shamos, Computational Geometry, Springer-Verlag, New York, 1985.
•  E. A. Ramos, An optimal deterministic algorithm for computing the diameter of a three-dimensional point set, Discrete and Computational Geometry 26(2) (2001), 233–244.
•  J. M. Robert, Maximum distance between two sets of points in , Pattern Recognition Letters 14 (1993), 733–735.
•  G. T. Toussaint and M. A. McAlear, A simple algorithm for finding the maximum distance between two finite planar sets, Pattern Recognition Letters 1 (1982), 21–24.
•  Y. Yang, M. Lin, J. Xu, and Y. Xie, Minimum spanning trees with neighborhoods, In Proc. 3rd Int. Conf. on Algor. Aspects in Information and Management (AAIM), vol. 4508 of LNCS, Springer, 2007, pp. 306–316.