DeepAI

# O(^2k/k)-Approximation Algorithm for Directed Steiner Tree: A Tight Quasi-Polynomial-Time Algorithm

In the Directed Steiner Tree (DST) problem we are given an n-vertex directed edge-weighted graph, a root r, and a collection of k terminal nodes. Our goal is to find a minimum-cost arborescence that contains a directed path from r to every terminal. We present an O(^2 k/k)-approximation algorithm for DST that runs in quasi-polynomial-time. By adjusting the parameters in the hardness result of Halperin and Krauthgamer, we show the matching lower bound of Ω(^2k/k) for the class of quasi-polynomial-time algorithms. This is the first improvement on the DST problem since the classical quasi-polynomial-time O(^3 k) approximation algorithm by Charikar et al. (The paper erroneously claims an O(^2k) approximation due to a mistake in prior work.) Our approach is based on two main ingredients. First, we derive an approximation preserving reduction to the Label-Consistent Subtree (LCST) problem. The LCST instance has quasi-polynomial size and logarithmic height. We remark that, in contrast, Zelikovsky's heigh-reduction theorem used in all prior work on DST achieves a reduction to a tree instance of the related Group Steiner Tree (GST) problem of similar height, however losing a logarithmic factor in the approximation ratio. Our second ingredient is an LP-rounding algorithm to approximately solve LCST instances, which is inspired by the framework developed by Rothvoß. We consider a Sherali-Adams lifting of a proper LP relaxation of LCST. Our rounding algorithm proceeds level by level from the root to the leaves, rounding and conditioning each time on a proper subset of label variables. A small enough (namely, polylogarithmic) number of Sherali-Adams lifting levels is sufficient to condition up to the leaves.

• 21 publications
• 15 publications
• 25 publications
10/26/2021

### Polynomial Integrality Gap of Flow LP for Directed Steiner Tree

In the Directed Steiner Tree (DST) problem, we are given a directed grap...
12/05/2018

### A Quasi-Polynomial Algorithm for Submodular Tree Orienteering in Directed Graphs

We consider the following general network design problem on directed gra...
02/01/2018

### On Polynomial time Constructions of Minimum Height Decision Tree

In this paper we study a polynomial time algorithms that for an input A⊆...
12/12/2019

### A Constant-Factor Approximation for Directed Latency in Quasi-Polynomial Time

We give the first constant-factor approximation for the Directed Latency...
11/25/2019

### Breaking the Barrier of 2 for the Storage Allocation Problem

Packing problems are an important class of optimization problems. The pr...
07/26/2019

### Tight Approximation for Variants of Directed Steiner Tree via State-Tree Decomposition and Linear Programming Rounding

Directed Steiner Tree (DST) is a central problem in combinatorial optimi...
07/19/2018

### A Tale of Santa Claus, Hypergraphs and Matroids

A well-known problem in scheduling and approximation algorithms is the S...

## 1 Introduction

In the Directed Steiner Tree (DST) problem, we are given an -vertex digraph with cost on each edge , a root vertex and a set of terminals . The goal is to find a minimum-cost out-arborescence rooted at that contains an directed path for every terminal . W.l.o.g. we assume that edge costs satisfy triangle inequality.

The DST problem is a fundamental problem in the area of network design that is known for its bizarre behaviors. While constant-approximation algorithms have been known for its undirected counterpart (see, e.g., [3, 29, 31]), the best known polynomial-time approximation algorithm for this problem could achieve only an approximation ratio in time for any , due to the classical work of Charikar et al. [5]. Even allowing this algorithm to run in quasi-polynomial-time, the best approximation ratio remains  [5]111The original paper claims an -approximation algorithm; however, their result was based on the initial statement of the Zelikovsky’s height-reduction theorem in [32], which was later found to contain a subtle flaw and was restated by Helvig, Robin and Zelikovsky [19].. Since then, there have been efforts to get improvements either in the running-time or in the approximation guarantee of this problem, e.g, using the the primal-dual method [33], Sum-of-Squares (a.k.a. Lasserre) hierarchy [30], Sherali-Adams and Lovász-Schrijver hierarchies [12]. Despite all these efforts, there has been no significant improvement over the course of the last two decades for both polynomial and quasi-polynomial time algorithms. In fact, it is known from the work of Halperin and Krauthgamer [17] that unless , it is not possible to achieve an approximation ratio , for any constant , and such lower bound applies to both polynomial and quasi-polynomial time algorithms. This means that there is a huge gap between the upper bound of and the lower bound of for polynomial-time algorithms. All efforts were failed to obtain even an -approximation algorithm that runs in polynomial-time.

For the class of quasi-polynomial-time algorithms, the approximation ratio of is arguably disappointing. This is because its closely related special case, namely, the Group Steiner Tree (GST) problem, is known to admit a quasi-polynomial-time -approximation algorithm on general graphs due to the work of Chekuri and Pal [6]. A natural question would be whether such an approximation ratio could be achieved in quasi-polynomial-time for DST as well. Nevertheless, achieving this improvement with the known techniques seems to be impossible. Indeed, all previous algorithms for DST [5, 30, 12] rely on the well-known Zelikovsky’s height-reduction theorem [32, 19]. These algorithms (implicitly) reduce DST to GST on trees, which loses an approximation factor in the process. Furthermore, the -hardness of Halperin and Krauthgamer [17] carries over to GST on trees. We remark that algorithms for many related problems (see, e.g., [10, 15]) rely on the same height-reduction theorem.

### 1.1 Our Results and Techniques

The purpose of this work is to close the gap between the lower and upper bounds on the approximability of DST in quasi-polynomial time. Our main result is as follows.

###### Theorem 1.1.

There is a randomized -approximation algorithm for DST with running time .

By analyzing the proofs in [17], we also show that this bound is asymptotically tight under stronger assumptions; please see more discussion in Appendix C.

###### Theorem 1.2.

There is no quasi-polynomial-time algorithm for DST that achieves an approximation ratio unless or the Projection Game Conjecture is false.

Our upper bound is based on two main ingredients. The first one is a quasi-polynomial-time approximation-preserving reduction to a novel Label-Consistent Subtree (LCST) problem. Roughly speaking, in LCST we are given a rooted tree plus node labels of two types, global and local. A feasible solution consists of a subtree that satisfies proper constraints on the labels. Intuitively, local labels are used to guarantee that a feasible solution induces an arborescence rooted at in the original problem, while global labels are used to enforce that all the terminals are included in such arborescence. In our reduction the tree has size and height , with global labels. For a comparison, Zelikovsky’s height-reduction theorem [32], used in all prior work on DST, reduces (implicitly) the latter problem to a GST instance over a tree of height . However, this reduction alone loses a factor in the approximation (while our reduction is approximation-preserving).

Our second ingredient is a quasi-polynomial-time -approximate LP-rounding algorithm for LCST instances arising from the previous reduction. Here we exploit the LP-hierarchy framework developed by Rothvoß [30] (and later simplified by Friggstad et al. [12]). We define a proper LP relaxation for the problem, and solve an -level Sherali-Adams lifting of this LP for a parameter . We then round the resulting fractional solution level by level from the root to the leaves. At each level we maintain a small set of labels that must be provided by the subtree. By randomly rounding label-based variables and conditioning, we push the set of labels all the way down to the leaves, guaranteeing that the output tree is always label-consistent. Thanks to the limited height of the tree and to the small number of labels along root-to-leaf paths, a polylogarithmic number of lifting levels is sufficient to perform the mentioned conditioning up to the leaves. As in [30]

, the probability that each global label appears in the tree we directly construct is only

. We need to repeat the process times in order to make sure all labels are included with high probability, leading to the claimed approximation ratio. Our result gives one more application of using LP/SDP hierarchies to obtain improved approximation algorithms, in addition to a few other ones (see, e.g., [2, 8, 9, 25, 14]).

We believe that our basic strategy of combining a label-based reduction with a round-and-condition rounding strategy as mentioned above might find applications to other problems, and it might therefore be of independent interest.

### 1.2 Comparison to Previous Work

Our algorithm is inspired by two results. First is the recursive greedy algortihm of Chekuri and Pal for GST [6], and second is the hierrachical based LP-rounding techniques by Rothvoß [30].

As mentioned, the algorithm of Chekuri and Pal is the first one that yields an approximation ratio of for GST, which is a special case of DST, in quasi-polynomial-time. This is almost tight for the class of quasi-polynomial-time algorithms. Their algorithm exploits the fact that any optimal solution can be shortcut into a path of length , while paying only a factor of 2 (such path exists in the metric-closure of the input graph). This simple observation allows them to derive a recursive greedy algorithm. In more detail, they try to identify a vertex that separates the optimal path into two equal-size subpaths by iterating over all the vertices; then they recursively (and approximately) solve two subproblems and pick the best approximate sub-solution greedily. Their analysis, however, requires the fact that both recursive calls end at the same depth (because each subpath has length different by at most one).

We imitate the recursive greedy algorithm by recursively splitting the optimal solution via balanced tree separators. The same approach as in [6]

, unfortunately, does not quite work out for us since subproblem sizes may differ by a multiplicative factor. This process, somehow, gives us a decision tree that contains a branch-decomposition of every solution, which is sufficient to devise an approximation algorithm. Note, however, that not every subtree of this decision tree can be transformed into a connected graph, and thus, it is not guaranteed that we can find a feasible DST solution from this decision tree. We introduce node-labels and label-consistent constraints specifically to solve this issue.

The label-consistency requirement could not be handled simply by applying DST algorithms as a blackbox. This comes to the second component that is inspired by the framework developed by Rothvoß [30]. While the framework was originally developed for the Sum-of-Squares hierarchy, it was shown by Friggstad et al. [12] that it also applies to Sherali-Adams, which is a weaker hierarchy. We apply the framework of Rothvoß to our Sherali-Adams lifted-LP but taking the label-consistency requirement into account.

### 1.3 Related Work

We already mentioned some of the main results about DST and GST. For GST there is a polynomial-time algorithm by Garg et al. [13] that achieves an approximation factor of , where is the number of groups. Their algorithm first maps the input instance into a tree instance by invoking the Probabilistic Metric-Tree Embeddings [1, 11], thus losing a factor in the approximation ratio. They then apply an elegant LP-based randomized rounding algorithm to the instance on a tree. A well-known open problem is whether it is possible to avoid the factor in the approximation ratio. This was later achieved by Chekuri and Pal [6], however their algorithm runs in quasi-polynomial-time.

Some works were devoted to the survivable network variants of DST and GST, namely -DST and -GST, respectively. Here one requires to have edge-disjoint directed (resp., undirected) paths from the root to each terminal (resp., group). Cheriyan et al. [7] showed that -DST admits no -approximation algorithm, for any , unless . Laekhanukit [23] showed that the problem admits no -approximation for any constant , unless . Nevertheless, the negative results do not rule out the possibility of achieving reasonable approximation factors for small values of . In particular, Grandoni and Laekhanukit [15] (exploiting some ideas in [24]) recently devised a poly-logarithmic approximation algorithm for -DST that runs in quasi-polynomial time.

Concerning -GST, Gupta et al. [16] presented a -approximation algorithm for -GST. The same problem admits an -approximation algorithm, where is the largest cardinality of a group [21]. Chalermsook et al. [4] presented an LP-rounding bicriteria approximation algorithm for -GST that returns a subgraph with cost times the optimum while guaranteeing a connectivity of at least . They also showed that -GST is hard to approximate to within a factor of , for some fixed constant , and if is large enough, then the problem is at least as hard as the Label-Cover problem, meaning that -GST admits no -approximation algorithm, for any constant , unless .

## 2 Preliminaries

Given a graph , we denote by and the vertex and edge set of , respectively. Throughout this paper, we treat a rooted tree as an out-arborescence; that is, edges are directed towards the leaves. Given a rooted tree , we use to denote its root. For any rooted tree and , we shall use to denote the sub-tree of containing and all descendants of . For a directed edge , we use and to denote the head and tail of . Generally, we will use the term vertex to mean a vertex of a DST instance, and we will use the term node to mean a vertex in an instance of the Label-Consistent Subtree problem, defined below:

#### Label-Consistent Subtree (LCST).

The new problem we introduce is the Label-Consistent Subtree (LCST) problem. The input consists of a rooted tree of size and height

, a node cost vector

, and a set of labels, among which there are global labels . The other labels are called local labels. Each node has two label sets: a set of demand labels, and a set of service labels.

We say that a subtree of with is label-consistent if for every vertex and , there is a descendant of in such that . The goal of the LCST problem is to find a label-consistent subtree of of minimum cost that contains all global labels, i.e, for every , there is a with .

In Section 4, we give an -time -approximation algorithm for the LCST problem, where . Thus, we require to be small in order to derive a quasi-polynomial-time algorithm; fortunately, this is the case for the instance reduced from DST.

One may generalize LCSs to general graphs, say Label-Consistent Steiner Subgraph (LCSS).

#### Balanced Tree Partition.

A main tool in our reduction is the following standard balanced-tree-partition lemma (with proof given in Appendix A for completeness).

###### Lemma 2.1 (Balanced-Tree-Partition).

For any , for any -vertex tree rooted at a vertex , there exists a vertex such that can be decomposed into two trees and rooted at and , respectively, in such a way that , and and . In other words, and are sub-trees that form a balanced partition of (the edges of) .

In this section, we give some basic facts about Sherali-Adams hierarchy that we will need. Assume we have a linear program polytope

defined by . We assume that are part of the linear constraints. The set of integral feasible solutions is defined as . It is convenient to think of each as an event, and in a solution , indicates whether the event happens or not.

The idea of Sherali-Adams hierarchy is to strengthen the original LP by adding more variables and constraints. Of course, each should still be a feasible solution to the strengthened LP (when extended to a vector in the higher-dimensional space). For some , the -th round of Sherali-Adams lift of the linear program has variables , for every . For every solution , is supposed to indicate whether all the events in happen or not in the solution ; that is, . Thus each can be naturally extended to a 0/1-vector in the higher-dimensional space defined by all the variables.

To derive the set of constraints, let us focus on the -th constraint in the original linear program. Consider two subsets such that . Then the following constraint is valid for ; i.e, all , the constraint is satisfied:

 ∏i∈Sxi∏i∈T(1−xi)(∑ni=1aj,ixi−bj)≤0.

To linearize the above constraint, we expand the left side of the above inequality and replace each monomial with the corresponding variable. Then, we obtain the following :

 ∑T′⊆T(−1)|T′|(∑ni=1aj,ixS∪T′∪{i}−bjxS∪T′)≤0. (1)

The -th round of Sherali-Adams lift contains the above constraint for all such that , and the trivial constraint that . For a polytope and an integer , we use to denote the poltyope obtained by the -th round Sherali-Adams lift of . For every , we identify the variable in the original LP and in a lifted LP.

Let for some linear program on variables and . Let be an event such that ; then we can define a solution obtained from by “conditioning” on the event . For every , is defined as . We shall show that will be in (Property 2).

It is useful to consider the ideal case where corresponds to a convex combination of integral solutions in . Then we can view as a distribution over . Conditioning on the event over the solution corresponds to conditioning on over the distribution . With this view, it is not hard to image the statements in the following claim (which we prove in the appendix) should hold:

###### Claim 2.2.

For some with , the following statements hold:

1. [label=(2.2),leftmargin=*]

2. for every .

3. If for some , then for every .

4. If every has , then .

Letting be obtained from by conditioning on some event , the following holds:

1. [label=(2.2),leftmargin=*, start=4]

2. .

3. .

4. If for some , then .

Keep in mind that the three properties 1, 1 and 3 will be used over and over again, often without referring to them. 1 says that conditioning on will fix to 1. 3 says that once a variable is fixed to or , then it can not be changed by conditioning operations.

## 3 Reducing Directed Steiner Tree to Label-Consistent Subtree

In this section, we present a reduction from DST to LCST. In Section 3.1, we define a decomposition tree, which corresponds to a recursive partitioning of a Steiner tree of . We show that the DST problem is equivalent to finding a small cost decomposition tree. Due to the balanced-partition lemma (Lemma 2.1), we can guarantee that decomposition trees have depth , a crucial property needed to obtain a quasi-polynomial-time algorithm. Then in Section 3.2 we show that the task of finding a small cost decomposition tree can be reduced to an LCST instance on a tree of depth . Roughly speaking, for a decomposition tree to be valid, we require that the separator vertex appears in both parts of a partition: as a root in one part and possibly a non-root in the other. This can be captured by the label-consistency requirement.

We shall use to denote a Steiner tree in the original graph , and to denote vertices in . We use to denote a decomposition tree, and to denote nodes of a decomposition tree. will be used for the input tree of the LCST instance. We use for a sub-tree of and for nodes in . The convention extends to variants of these notations as well.

### 3.1 Decomposition Trees

We now define decomposition trees. Recall that in the DST problem, we are given a graph , a root , and a set of terminals.

###### Definition 3.1.

A decomposition tree is a rooted tree where each node is associated with a vertex and each leaf-node is associated with an edge . Moreover, the following conditions are satisfied:

1. [label=(3.1), leftmargin=*]

2. .

3. For every leaf of , we have .

4. For every non-leaf of and every child of with the following holds. There is a child of with such that for some leaf . In particular, this implies that has at least one child with .

The cost of a decomposition tree is defined as .

We say a vertex is involved in a sub-tree of a decomposition tree if either or there is a leaf of such that . So the second sentence in Property 3 can be changed to the following: There is a child of with such that is involved in .

We show that the DST problem can be reduced to the problem of finding a small-cost decomposition tree of depth . This is done in two directions.

#### From Directed Steiner Tree to Decomposition Tree.

We first show that the optimum directed Steiner tree of connecting to all terminals in gives a good decomposition tree of cost at most that of , which we denote by . Since we assumed costs of edges in satisfy triangle inequalities, we can assume every vertex has at least two children in . This implies . The decomposition tree can be constructed by applying Lemma 2.1 on recursively until we obtain trees with singular edges. Formally, we set , where is defined in Algorithm 1. Notice that the algorithm is only for analysis purpose and is not a part of our algorithm for DST.

###### Claim 3.2.

is a full binary decomposition tree of height and cost that involves all vertices in . Moreover, for every , there is exactly one leaf of with .

#### From Decomposition Tree to Directed Steiner Tree.

Now we show the other direction of the reduction. The lemma we shall prove is the following:

###### Lemma 3.3.

Given a decomposition tree that involves all terminals in , we can efficiently construct a directed Steiner tree in connecting to all terminals in with cost at most .

Thus, our goal is to find a decomposition tree of small cost involving all terminals in . To do so, we construct an instance of the LCST problem.

### 3.2 Construction of LCST Instance

Let be the term in Claim 3.2 that upper bounds the height of . In the reduction, we shall “collapse” every levels of a decomposition tree into one level; this is used to obtain the improvement of in the approximation ratio. It motivates the definition of a twig, which corresponds to a full binary tree of depth at most that can appear as a part of a decomposition tree:

###### Definition 3.4.

A twig is a rooted full binary tree of depth at most , where

• each is associated with a , such that for every internal node in , at least one child of has , and

• each leaf of may or may not be associated with a value ; if is defined then .

With the twigs defined, our LCST instance is constructed by calling , where is defined in Algorithm 2. See Figure 2 for illustration of one recursion of .

###### Remark 3.5.

The and values of nodes in are irrelevant for the LCST instance. They will, however, help us in mapping the decomposition tree to its corresponding solution to LCST.

Notice that there are two types of nodes in : (1) -nodes are those created in Step 1 and (2) -nodes are those created in Step 4. We always use (, resp.) and its variants to denote -nodes (-nodes resp.).

We give some intuition behind the construction of . We can partition the edges of a decomposition tree into an -depth tree of twigs. For each in the tree, we apply the following operation. First, we replace with a node with . Second, we insert a virtual parent of with between this and its actual parent. Then it is fairly straightforward to see that we can find a copy of this resulting tree in . Thus, we reduced the problem of finding (and thus ) to the problem of finding a subtree of . The label-consistency requirements shall guarantee that will correspond to a valid . In particular, the demand label for a node created in Step 1 guarantees that if is selected then we shall select at least one child of . The demand labels created in Step 11 for a node guarantee that if is selected, then all its children must be selected, while the demand labels created in Step 15 guarantee Property 3 of . The set of global labels is exactly . In Step 8, we add a global label to if contains a leaf with .

A simple observation we can make is the following:

###### Claim 3.6.

is a rooted tree with vertices and height , where .

Also, it is easy to see that a node will have exactly one demand label, while a node can have up to demand labels. So, we have .

We then show that the problem of finding a decomposition tree can be reduced to that of finding a label-consistent subtree of . Again, this is done in two directions.

#### From Decomposition Tree to Label-Consistent Subtree

To show that there is a good label-consistent subtree of , we need to construct a tree of twigs from . This is done as follows. For every , and every internal node in of depth , we create a twig rooted at containing all descendants of at depth . Let be the set of twigs created. A rooted tree over can be naturally defined: a twig is a parent of if and only if is a leaf in . So, has depth at most .

can be found naturally by calling (with being empty initially), where is defined in Algorithm 3, and the trees are as defined in Algorithm 2. The recursive procedure takes two parameters: a node in and a twig . It is guaranteed that : The root recursion satisfy this condition since ; in Step 4, we also have . The tree can be constructed as has depth at most . Again, this algorithm is only for analysis purpose and is not a part of our algorithm for DST. We prove in the appendix the following lemma.

###### Lemma 3.7.

is a label-consistent sub-tree of with cost exactly . Moreover, all global labels in are supplied by .

#### From Label-Consistent Subtree to Decomposition Tree.

The following lemma gives the other direction, and its proof will be deferred to the appendix.

###### Lemma 3.8.

Given any feasible solution to the LCST instance , in time we can construct a decomposition tree with . Moreover, if a global label is supplied by , then involves .

#### Wrapping up.

We prove the following theorem in the next section. Recall that and are respectively the size and height of the input tree to the LCST instance, and is the number of global labels.

###### Theorem 3.9.

There is an -time -approximation algorithm for the Label-Consistent Subtree problem where .

With this theorem at hand, we can now finish our -approximation for DST that runs in quasi-polynomial time. Given a DST instance, we shall construct the LCST instance of size and height as in Algorithm 2. Notice that for the LCST instance, we have . By creftypecap 3.2 and Lemma 3.7, there is a solution to the LCST instance of cost at most . Applying Theorem 3.9, we can obtain a feasible solution of cost at most in time (as ). Applying Lemma 3.8 and Lemma 3.3, we can obtain a Directed Steiner tree in of cost at most connecting to all terminals in . This gives a -approximation for DST in running time , finishing the proof of Theorem 1.1.

## 4 Approximation Algorithm for Label-Consistent Subtree

The goal of this section is to prove Theorem 3.9, which is repeated below. Since we are not dealing with the original DST problem any more, we use for trees and for nodes in this section. See 3.9

### 4.1 Redefining the LCST Problem

We shall first simplify the input instance w.l.o.g in the following ways that will make our presentation much cleaner. Indeed, some properties are already satisfied by the LCST instance reduced from the DST problem; however we want to make Theorem 3.9 as general as possible and thus we do not make these assumptions in the theorem statement.

1. [leftmargin=*]

2. We can assume for every two distinct nodes and , and are disjoint. If some local label appears in for different nodes , we can make copies of and let each copy be contained in for exactly one . We can replace the appearance of in some with the copies.

3. We can assume the demand labels are only at the internal nodes. Suppose a leaf has . If , then can be removed from ; otherwise can never be selected thus can be removed from .

4. We can assume that the service labels are only at the leaves and each leaf contains exactly one service label. A leaf without a service label can be removed. For a non-leaf with , we can attach leaves of cost to and distribute the service labels to the newly added leaves. Similarly, if a leaf has , we can attach new leaves to .

Notice that the above operations do not change the set of global labels and .

With the above operations and simplifications, we can redefine the LCST instance. Let and respectively be the sets of leaves and internal nodes of . For every node , let be the set of children of . For every , let be the set of descendants of that are leaves.

For every , let be the unique label in . From now on we shall not use the notation anymore. Thus, a rooted subtree of with is label-consistent if, for every and , there is a node with .

The goal of the problem is to find the minimum cost label-consistent subtree of that provides all the global labels, i.e, that satisfies for all there exists a with . Recall that we are given a node-cost vector . The cost of a sub-tree of , denoted as , is defined as .

We consider the change in the size and height of after we applied the above operations. Abusing notations slightly, we shall use and to store the size and height of the old (i.e, the before we apply the operations), and and be the size and height of the new (i.e, the after we apply the operations). Notice that we only added leaves to . Thus, we have . The number of internal nodes in the new is at most . A leaf is relevant only when it is providing a label that are in for some ancestor of . If a node has many leaf children with the same service label, we only need to keep the one with the smallest cost. Since each has and the height of the old is , we can assume that the number of leaves in the new is at most . So .

Let be the optimum tree for the given instance. Let be the cost of the , i.e, .222We remark that it is easy to check whether a valid solution exists or not: an is useless if for some there is no with . We repeatedly remove useless nodes and their descendents until no such nodes exist. There is a valid solution iff the remaining provides all labels in . So we can assume the instance has a valid solution. As every local label appears only once in , we can assume that for every , there is at most one node with : if there are multiple such nodes , we can keep one without violating the label-consistency condition and that all global labels are provided. Thus additionally we can assume satisfies the following conditions:

1. [label=(2), leftmargin=*]

2. For every , there is exactly one node such that .

3. For every , there is at most one node such that .

The main theorem we shall prove is the following

###### Theorem 4.2.

There is an -time algorithm that outputs a random label-consistent tree such that, , and for every , we have .

With theorem 4.2, we can finish the proof of Theorem 3.9.

###### Proof of Theorem 3.9.

We run times the algorithm stated in Theorem 4.2 and let be the union of all the trees produced. It is easy to see that is always label-consistent. The expected cost of is

 E[cost(T′)]≤O(hlogk)opt.

If the term is sufficiently large, by the union bound, we can obtain

 Pr[∀ℓ∈K,∃v∈Vleaf∩V(T′),av=ℓ]≥1/2. (2)

We repeatedly run the above procedure until happens and output the tree satisfying the property. Let be this tree. Then we have due to (2). In expectation we only need run the procedure twice.

Thus, we obtain an -approximation algorithm for LCST. The running time of the algorithm is . Recall that and are the height and size of before we applied the operations; thus the theorem follows. ∎

Thus, our goal is to prove Theorem 4.2. Our algorithm is very similar to that of [30] for GST on trees. We solve the lifted LP relaxation for the LCST problem and then round the fractional solution via a recursive procedure. In the procedure, we focus on some sub-tree , and we are given a set of labels that must appear in , where is our output tree. We are also given a lifted LP solution ; we can restrict on the tree . The set of labels appear in fully according to . Then, for every , we randomly choose child of that is responsible for this and then apply some conditioning operations on . We recursively call the procedure for the children of . This way, we can guarantee that the tree we output is always label-consistent. Finally, we show that each global label appears in with large probability, using the technique that is very similar to that of [30].

### 4.2 Basic LP Relaxation

The remaining part of the section is dedicated to the proof of Theorem 4.2. We formulate an LP relaxation that aims at finding the , where the variables of the LP are indexed by . We view every element in also as an event. Supposedly, an event happens if and only if , and an event happens if and only if and has a node with label (such a node is unique if it exists by Properties 1 and 2). For every , is supposed to indicate whether event happens or not. Then the following linear constraints are valid:

(3) (4) (5) (6)
(7) (8) (9)

(3) holds since is rooted sub-tree of with , (4) holds by definition of events, (5) follows from that is label-consistent, and (6) holds trivially. (7) follows from Properties 1 and 2. (8) holds trivially and (9) follows from Property 1.

Let be the polytope containing all vectors satisfying constraints (3) to (9). The following simple observation can be made:

###### Claim 4.3.

For every , , and , we have .

###### Proof.

The claim holds trivially if . When , summing up (7) over all internal nodes in and gives the equality. ∎

Let