1 Introduction
In the Directed Steiner Tree (DST) problem, we are given an vertex digraph with cost on each edge , a root vertex and a set of terminals . The goal is to find a minimumcost outarborescence rooted at that contains an directed path for every terminal . W.l.o.g. we assume that edge costs satisfy triangle inequality.
The DST problem is a fundamental problem in the area of network design that is known for its bizarre behaviors. While constantapproximation algorithms have been known for its undirected counterpart (see, e.g., [3, 29, 31]), the best known polynomialtime approximation algorithm for this problem could achieve only an approximation ratio in time for any , due to the classical work of Charikar et al. [5]. Even allowing this algorithm to run in quasipolynomialtime, the best approximation ratio remains [5]^{1}^{1}1The original paper claims an approximation algorithm; however, their result was based on the initial statement of the Zelikovsky’s heightreduction theorem in [32], which was later found to contain a subtle flaw and was restated by Helvig, Robin and Zelikovsky [19].. Since then, there have been efforts to get improvements either in the runningtime or in the approximation guarantee of this problem, e.g, using the the primaldual method [33], SumofSquares (a.k.a. Lasserre) hierarchy [30], SheraliAdams and LovászSchrijver hierarchies [12]. Despite all these efforts, there has been no significant improvement over the course of the last two decades for both polynomial and quasipolynomial time algorithms. In fact, it is known from the work of Halperin and Krauthgamer [17] that unless , it is not possible to achieve an approximation ratio , for any constant , and such lower bound applies to both polynomial and quasipolynomial time algorithms. This means that there is a huge gap between the upper bound of and the lower bound of for polynomialtime algorithms. All efforts were failed to obtain even an approximation algorithm that runs in polynomialtime.
For the class of quasipolynomialtime algorithms, the approximation ratio of is arguably disappointing. This is because its closely related special case, namely, the Group Steiner Tree (GST) problem, is known to admit a quasipolynomialtime approximation algorithm on general graphs due to the work of Chekuri and Pal [6]. A natural question would be whether such an approximation ratio could be achieved in quasipolynomialtime for DST as well. Nevertheless, achieving this improvement with the known techniques seems to be impossible. Indeed, all previous algorithms for DST [5, 30, 12] rely on the wellknown Zelikovsky’s heightreduction theorem [32, 19]. These algorithms (implicitly) reduce DST to GST on trees, which loses an approximation factor in the process. Furthermore, the hardness of Halperin and Krauthgamer [17] carries over to GST on trees. We remark that algorithms for many related problems (see, e.g., [10, 15]) rely on the same heightreduction theorem.
1.1 Our Results and Techniques
The purpose of this work is to close the gap between the lower and upper bounds on the approximability of DST in quasipolynomial time. Our main result is as follows.
Theorem 1.1.
There is a randomized approximation algorithm for DST with running time .
By analyzing the proofs in [17], we also show that this bound is asymptotically tight under stronger assumptions; please see more discussion in Appendix C.
Theorem 1.2.
There is no quasipolynomialtime algorithm for DST that achieves an approximation ratio unless or the Projection Game Conjecture is false.
Our upper bound is based on two main ingredients. The first one is a quasipolynomialtime approximationpreserving reduction to a novel LabelConsistent Subtree (LCST) problem. Roughly speaking, in LCST we are given a rooted tree plus node labels of two types, global and local. A feasible solution consists of a subtree that satisfies proper constraints on the labels. Intuitively, local labels are used to guarantee that a feasible solution induces an arborescence rooted at in the original problem, while global labels are used to enforce that all the terminals are included in such arborescence. In our reduction the tree has size and height , with global labels. For a comparison, Zelikovsky’s heightreduction theorem [32], used in all prior work on DST, reduces (implicitly) the latter problem to a GST instance over a tree of height . However, this reduction alone loses a factor in the approximation (while our reduction is approximationpreserving).
Our second ingredient is a quasipolynomialtime approximate LProunding algorithm for LCST instances arising from the previous reduction. Here we exploit the LPhierarchy framework developed by Rothvoß [30] (and later simplified by Friggstad et al. [12]). We define a proper LP relaxation for the problem, and solve an level SheraliAdams lifting of this LP for a parameter . We then round the resulting fractional solution level by level from the root to the leaves. At each level we maintain a small set of labels that must be provided by the subtree. By randomly rounding labelbased variables and conditioning, we push the set of labels all the way down to the leaves, guaranteeing that the output tree is always labelconsistent. Thanks to the limited height of the tree and to the small number of labels along roottoleaf paths, a polylogarithmic number of lifting levels is sufficient to perform the mentioned conditioning up to the leaves. As in [30]
, the probability that each global label appears in the tree we directly construct is only
. We need to repeat the process times in order to make sure all labels are included with high probability, leading to the claimed approximation ratio. Our result gives one more application of using LP/SDP hierarchies to obtain improved approximation algorithms, in addition to a few other ones (see, e.g., [2, 8, 9, 25, 14]).We believe that our basic strategy of combining a labelbased reduction with a roundandcondition rounding strategy as mentioned above might find applications to other problems, and it might therefore be of independent interest.
1.2 Comparison to Previous Work
Our algorithm is inspired by two results. First is the recursive greedy algortihm of Chekuri and Pal for GST [6], and second is the hierrachical based LProunding techniques by Rothvoß [30].
As mentioned, the algorithm of Chekuri and Pal is the first one that yields an approximation ratio of for GST, which is a special case of DST, in quasipolynomialtime. This is almost tight for the class of quasipolynomialtime algorithms. Their algorithm exploits the fact that any optimal solution can be shortcut into a path of length , while paying only a factor of 2 (such path exists in the metricclosure of the input graph). This simple observation allows them to derive a recursive greedy algorithm. In more detail, they try to identify a vertex that separates the optimal path into two equalsize subpaths by iterating over all the vertices; then they recursively (and approximately) solve two subproblems and pick the best approximate subsolution greedily. Their analysis, however, requires the fact that both recursive calls end at the same depth (because each subpath has length different by at most one).
We imitate the recursive greedy algorithm by recursively splitting the optimal solution via balanced tree separators. The same approach as in [6]
, unfortunately, does not quite work out for us since subproblem sizes may differ by a multiplicative factor. This process, somehow, gives us a decision tree that contains a branchdecomposition of every solution, which is sufficient to devise an approximation algorithm. Note, however, that not every subtree of this decision tree can be transformed into a connected graph, and thus, it is not guaranteed that we can find a feasible DST solution from this decision tree. We introduce nodelabels and labelconsistent constraints specifically to solve this issue.
The labelconsistency requirement could not be handled simply by applying DST algorithms as a blackbox. This comes to the second component that is inspired by the framework developed by Rothvoß [30]. While the framework was originally developed for the SumofSquares hierarchy, it was shown by Friggstad et al. [12] that it also applies to SheraliAdams, which is a weaker hierarchy. We apply the framework of Rothvoß to our SheraliAdams liftedLP but taking the labelconsistency requirement into account.
1.3 Related Work
We already mentioned some of the main results about DST and GST. For GST there is a polynomialtime algorithm by Garg et al. [13] that achieves an approximation factor of , where is the number of groups. Their algorithm first maps the input instance into a tree instance by invoking the Probabilistic MetricTree Embeddings [1, 11], thus losing a factor in the approximation ratio. They then apply an elegant LPbased randomized rounding algorithm to the instance on a tree. A wellknown open problem is whether it is possible to avoid the factor in the approximation ratio. This was later achieved by Chekuri and Pal [6], however their algorithm runs in quasipolynomialtime.
Some works were devoted to the survivable network variants of DST and GST, namely DST and GST, respectively. Here one requires to have edgedisjoint directed (resp., undirected) paths from the root to each terminal (resp., group). Cheriyan et al. [7] showed that DST admits no approximation algorithm, for any , unless . Laekhanukit [23] showed that the problem admits no approximation for any constant , unless . Nevertheless, the negative results do not rule out the possibility of achieving reasonable approximation factors for small values of . In particular, Grandoni and Laekhanukit [15] (exploiting some ideas in [24]) recently devised a polylogarithmic approximation algorithm for DST that runs in quasipolynomial time.
Concerning GST, Gupta et al. [16] presented a approximation algorithm for GST. The same problem admits an approximation algorithm, where is the largest cardinality of a group [21]. Chalermsook et al. [4] presented an LProunding bicriteria approximation algorithm for GST that returns a subgraph with cost times the optimum while guaranteeing a connectivity of at least . They also showed that GST is hard to approximate to within a factor of , for some fixed constant , and if is large enough, then the problem is at least as hard as the LabelCover problem, meaning that GST admits no approximation algorithm, for any constant , unless .
2 Preliminaries
Given a graph , we denote by and the vertex and edge set of , respectively. Throughout this paper, we treat a rooted tree as an outarborescence; that is, edges are directed towards the leaves. Given a rooted tree , we use to denote its root. For any rooted tree and , we shall use to denote the subtree of containing and all descendants of . For a directed edge , we use and to denote the head and tail of . Generally, we will use the term vertex to mean a vertex of a DST instance, and we will use the term node to mean a vertex in an instance of the LabelConsistent Subtree problem, defined below:
LabelConsistent Subtree (LCST).
The new problem we introduce is the LabelConsistent Subtree (LCST) problem. The input consists of a rooted tree of size and height
, a node cost vector
, and a set of labels, among which there are global labels . The other labels are called local labels. Each node has two label sets: a set of demand labels, and a set of service labels.We say that a subtree of with is labelconsistent if for every vertex and , there is a descendant of in such that . The goal of the LCST problem is to find a labelconsistent subtree of of minimum cost that contains all global labels, i.e, for every , there is a with .
In Section 4, we give an time approximation algorithm for the LCST problem, where . Thus, we require to be small in order to derive a quasipolynomialtime algorithm; fortunately, this is the case for the instance reduced from DST.
One may generalize LCSs to general graphs, say LabelConsistent Steiner Subgraph (LCSS).
Balanced Tree Partition.
A main tool in our reduction is the following standard balancedtreepartition lemma (with proof given in Appendix A for completeness).
Lemma 2.1 (BalancedTreePartition).
For any , for any vertex tree rooted at a vertex , there exists a vertex such that can be decomposed into two trees and rooted at and , respectively, in such a way that , and and . In other words, and are subtrees that form a balanced partition of (the edges of) .
SheraliAdams Hierarchy.
In this section, we give some basic facts about SheraliAdams hierarchy that we will need. Assume we have a linear program polytope
defined by . We assume that are part of the linear constraints. The set of integral feasible solutions is defined as . It is convenient to think of each as an event, and in a solution , indicates whether the event happens or not.The idea of SheraliAdams hierarchy is to strengthen the original LP by adding more variables and constraints. Of course, each should still be a feasible solution to the strengthened LP (when extended to a vector in the higherdimensional space). For some , the th round of SheraliAdams lift of the linear program has variables , for every . For every solution , is supposed to indicate whether all the events in happen or not in the solution ; that is, . Thus each can be naturally extended to a 0/1vector in the higherdimensional space defined by all the variables.
To derive the set of constraints, let us focus on the th constraint in the original linear program. Consider two subsets such that . Then the following constraint is valid for ; i.e, all , the constraint is satisfied:
To linearize the above constraint, we expand the left side of the above inequality and replace each monomial with the corresponding variable. Then, we obtain the following :
(1) 
The th round of SheraliAdams lift contains the above constraint for all such that , and the trivial constraint that . For a polytope and an integer , we use to denote the poltyope obtained by the th round SheraliAdams lift of . For every , we identify the variable in the original LP and in a lifted LP.
Let for some linear program on variables and . Let be an event such that ; then we can define a solution obtained from by “conditioning” on the event . For every , is defined as . We shall show that will be in (Property 2).
It is useful to consider the ideal case where corresponds to a convex combination of integral solutions in . Then we can view as a distribution over . Conditioning on the event over the solution corresponds to conditioning on over the distribution . With this view, it is not hard to image the statements in the following claim (which we prove in the appendix) should hold:
Claim 2.2.
3 Reducing Directed Steiner Tree to LabelConsistent Subtree
In this section, we present a reduction from DST to LCST. In Section 3.1, we define a decomposition tree, which corresponds to a recursive partitioning of a Steiner tree of . We show that the DST problem is equivalent to finding a small cost decomposition tree. Due to the balancedpartition lemma (Lemma 2.1), we can guarantee that decomposition trees have depth , a crucial property needed to obtain a quasipolynomialtime algorithm. Then in Section 3.2 we show that the task of finding a small cost decomposition tree can be reduced to an LCST instance on a tree of depth . Roughly speaking, for a decomposition tree to be valid, we require that the separator vertex appears in both parts of a partition: as a root in one part and possibly a nonroot in the other. This can be captured by the labelconsistency requirement.
We shall use to denote a Steiner tree in the original graph , and to denote vertices in . We use to denote a decomposition tree, and to denote nodes of a decomposition tree. will be used for the input tree of the LCST instance. We use for a subtree of and for nodes in . The convention extends to variants of these notations as well.
3.1 Decomposition Trees
We now define decomposition trees. Recall that in the DST problem, we are given a graph , a root , and a set of terminals.
Definition 3.1.
A decomposition tree is a rooted tree where each node is associated with a vertex and each leafnode is associated with an edge . Moreover, the following conditions are satisfied:

[label=(3.1), leftmargin=*]

.

For every leaf of , we have .

For every nonleaf of and every child of with the following holds. There is a child of with such that for some leaf . In particular, this implies that has at least one child with .
The cost of a decomposition tree is defined as .
We say a vertex is involved in a subtree of a decomposition tree if either or there is a leaf of such that . So the second sentence in Property 3 can be changed to the following: There is a child of with such that is involved in .
We show that the DST problem can be reduced to the problem of finding a smallcost decomposition tree of depth . This is done in two directions.
From Directed Steiner Tree to Decomposition Tree.
We first show that the optimum directed Steiner tree of connecting to all terminals in gives a good decomposition tree of cost at most that of , which we denote by . Since we assumed costs of edges in satisfy triangle inequalities, we can assume every vertex has at least two children in . This implies . The decomposition tree can be constructed by applying Lemma 2.1 on recursively until we obtain trees with singular edges. Formally, we set , where is defined in Algorithm 1. Notice that the algorithm is only for analysis purpose and is not a part of our algorithm for DST.
Claim 3.2.
is a full binary decomposition tree of height and cost that involves all vertices in . Moreover, for every , there is exactly one leaf of with .
From Decomposition Tree to Directed Steiner Tree.
Now we show the other direction of the reduction. The lemma we shall prove is the following:
Lemma 3.3.
Given a decomposition tree that involves all terminals in , we can efficiently construct a directed Steiner tree in connecting to all terminals in with cost at most .
Thus, our goal is to find a decomposition tree of small cost involving all terminals in . To do so, we construct an instance of the LCST problem.
3.2 Construction of LCST Instance
Let be the term in Claim 3.2 that upper bounds the height of . In the reduction, we shall “collapse” every levels of a decomposition tree into one level; this is used to obtain the improvement of in the approximation ratio. It motivates the definition of a twig, which corresponds to a full binary tree of depth at most that can appear as a part of a decomposition tree:
Definition 3.4.
A twig is a rooted full binary tree of depth at most , where

each is associated with a , such that for every internal node in , at least one child of has , and

each leaf of may or may not be associated with a value ; if is defined then .
With the twigs defined, our LCST instance is constructed by calling , where is defined in Algorithm 2. See Figure 2 for illustration of one recursion of .
Remark 3.5.
The and values of nodes in are irrelevant for the LCST instance. They will, however, help us in mapping the decomposition tree to its corresponding solution to LCST.
Notice that there are two types of nodes in : (1) nodes are those created in Step 1 and (2) nodes are those created in Step 4. We always use (, resp.) and its variants to denote nodes (nodes resp.).
We give some intuition behind the construction of . We can partition the edges of a decomposition tree into an depth tree of twigs. For each in the tree, we apply the following operation. First, we replace with a node with . Second, we insert a virtual parent of with between this and its actual parent. Then it is fairly straightforward to see that we can find a copy of this resulting tree in . Thus, we reduced the problem of finding (and thus ) to the problem of finding a subtree of . The labelconsistency requirements shall guarantee that will correspond to a valid . In particular, the demand label for a node created in Step 1 guarantees that if is selected then we shall select at least one child of . The demand labels created in Step 11 for a node guarantee that if is selected, then all its children must be selected, while the demand labels created in Step 15 guarantee Property 3 of . The set of global labels is exactly . In Step 8, we add a global label to if contains a leaf with .
A simple observation we can make is the following:
Claim 3.6.
is a rooted tree with vertices and height , where .
Also, it is easy to see that a node will have exactly one demand label, while a node can have up to demand labels. So, we have .
We then show that the problem of finding a decomposition tree can be reduced to that of finding a labelconsistent subtree of . Again, this is done in two directions.
From Decomposition Tree to LabelConsistent Subtree
To show that there is a good labelconsistent subtree of , we need to construct a tree of twigs from . This is done as follows. For every , and every internal node in of depth , we create a twig rooted at containing all descendants of at depth . Let be the set of twigs created. A rooted tree over can be naturally defined: a twig is a parent of if and only if is a leaf in . So, has depth at most .
can be found naturally by calling (with being empty initially), where is defined in Algorithm 3, and the trees are as defined in Algorithm 2. The recursive procedure takes two parameters: a node in and a twig . It is guaranteed that : The root recursion satisfy this condition since ; in Step 4, we also have . The tree can be constructed as has depth at most . Again, this algorithm is only for analysis purpose and is not a part of our algorithm for DST. We prove in the appendix the following lemma.
Lemma 3.7.
is a labelconsistent subtree of with cost exactly . Moreover, all global labels in are supplied by .
From LabelConsistent Subtree to Decomposition Tree.
The following lemma gives the other direction, and its proof will be deferred to the appendix.
Lemma 3.8.
Given any feasible solution to the LCST instance , in time we can construct a decomposition tree with . Moreover, if a global label is supplied by , then involves .
Wrapping up.
We prove the following theorem in the next section. Recall that and are respectively the size and height of the input tree to the LCST instance, and is the number of global labels.
Theorem 3.9.
There is an time approximation algorithm for the LabelConsistent Subtree problem where .
With this theorem at hand, we can now finish our approximation for DST that runs in quasipolynomial time. Given a DST instance, we shall construct the LCST instance of size and height as in Algorithm 2. Notice that for the LCST instance, we have . By creftypecap 3.2 and Lemma 3.7, there is a solution to the LCST instance of cost at most . Applying Theorem 3.9, we can obtain a feasible solution of cost at most in time (as ). Applying Lemma 3.8 and Lemma 3.3, we can obtain a Directed Steiner tree in of cost at most connecting to all terminals in . This gives a approximation for DST in running time , finishing the proof of Theorem 1.1.
4 Approximation Algorithm for LabelConsistent Subtree
The goal of this section is to prove Theorem 3.9, which is repeated below. Since we are not dealing with the original DST problem any more, we use for trees and for nodes in this section. See 3.9
4.1 Redefining the LCST Problem
We shall first simplify the input instance w.l.o.g in the following ways that will make our presentation much cleaner. Indeed, some properties are already satisfied by the LCST instance reduced from the DST problem; however we want to make Theorem 3.9 as general as possible and thus we do not make these assumptions in the theorem statement.

[leftmargin=*]

We can assume for every two distinct nodes and , and are disjoint. If some local label appears in for different nodes , we can make copies of and let each copy be contained in for exactly one . We can replace the appearance of in some with the copies.

We can assume the demand labels are only at the internal nodes. Suppose a leaf has . If , then can be removed from ; otherwise can never be selected thus can be removed from .

We can assume that the service labels are only at the leaves and each leaf contains exactly one service label. A leaf without a service label can be removed. For a nonleaf with , we can attach leaves of cost to and distribute the service labels to the newly added leaves. Similarly, if a leaf has , we can attach new leaves to .
Notice that the above operations do not change the set of global labels and .
With the above operations and simplifications, we can redefine the LCST instance. Let and respectively be the sets of leaves and internal nodes of . For every node , let be the set of children of . For every , let be the set of descendants of that are leaves.
For every , let be the unique label in . From now on we shall not use the notation anymore. Thus, a rooted subtree of with is labelconsistent if, for every and , there is a node with .
The goal of the problem is to find the minimum cost labelconsistent subtree of that provides all the global labels, i.e, that satisfies for all there exists a with . Recall that we are given a nodecost vector . The cost of a subtree of , denoted as , is defined as .
We consider the change in the size and height of after we applied the above operations. Abusing notations slightly, we shall use and to store the size and height of the old (i.e, the before we apply the operations), and and be the size and height of the new (i.e, the after we apply the operations). Notice that we only added leaves to . Thus, we have . The number of internal nodes in the new is at most . A leaf is relevant only when it is providing a label that are in for some ancestor of . If a node has many leaf children with the same service label, we only need to keep the one with the smallest cost. Since each has and the height of the old is , we can assume that the number of leaves in the new is at most . So .
Let be the optimum tree for the given instance. Let be the cost of the , i.e, .^{2}^{2}2We remark that it is easy to check whether a valid solution exists or not: an is useless if for some there is no with . We repeatedly remove useless nodes and their descendents until no such nodes exist. There is a valid solution iff the remaining provides all labels in . So we can assume the instance has a valid solution. As every local label appears only once in , we can assume that for every , there is at most one node with : if there are multiple such nodes , we can keep one without violating the labelconsistency condition and that all global labels are provided. Thus additionally we can assume satisfies the following conditions:

[label=(2), leftmargin=*]

For every , there is exactly one node such that .

For every , there is at most one node such that .
The main theorem we shall prove is the following
Theorem 4.2.
There is an time algorithm that outputs a random labelconsistent tree such that, , and for every , we have .
Proof of Theorem 3.9.
We run times the algorithm stated in Theorem 4.2 and let be the union of all the trees produced. It is easy to see that is always labelconsistent. The expected cost of is
If the term is sufficiently large, by the union bound, we can obtain
(2) 
We repeatedly run the above procedure until happens and output the tree satisfying the property. Let be this tree. Then we have due to (2). In expectation we only need run the procedure twice.
Thus, we obtain an approximation algorithm for LCST. The running time of the algorithm is . Recall that and are the height and size of before we applied the operations; thus the theorem follows. ∎
Thus, our goal is to prove Theorem 4.2. Our algorithm is very similar to that of [30] for GST on trees. We solve the lifted LP relaxation for the LCST problem and then round the fractional solution via a recursive procedure. In the procedure, we focus on some subtree , and we are given a set of labels that must appear in , where is our output tree. We are also given a lifted LP solution ; we can restrict on the tree . The set of labels appear in fully according to . Then, for every , we randomly choose child of that is responsible for this and then apply some conditioning operations on . We recursively call the procedure for the children of . This way, we can guarantee that the tree we output is always labelconsistent. Finally, we show that each global label appears in with large probability, using the technique that is very similar to that of [30].
4.2 Basic LP Relaxation
The remaining part of the section is dedicated to the proof of Theorem 4.2. We formulate an LP relaxation that aims at finding the , where the variables of the LP are indexed by . We view every element in also as an event. Supposedly, an event happens if and only if , and an event happens if and only if and has a node with label (such a node is unique if it exists by Properties 1 and 2). For every , is supposed to indicate whether event happens or not. Then the following linear constraints are valid:
(3) holds since is rooted subtree of with , (4) holds by definition of events, (5) follows from that is labelconsistent, and (6) holds trivially. (7) follows from Properties 1 and 2. (8) holds trivially and (9) follows from Property 1.
Let be the polytope containing all vectors satisfying constraints (3) to (9). The following simple observation can be made:
Claim 4.3.
For every , , and , we have .
Proof.
The claim holds trivially if . When , summing up (7) over all internal nodes in and gives the equality. ∎
4.3 Rounding a Lifted Fractional Solution
Let