 # Parameterized Approximation Schemes for Steiner Trees with Small Number of Steiner Vertices

We study the Steiner Tree problem, in which a set of terminal vertices needs to be connected in the cheapest possible way in an edge-weighted graph. This problem has been extensively studied from the viewpoint of approximation and also parameterization. In particular, on one hand Steiner Tree is known to be APX-hard, and W-hard on the other, if parameterized by the number of non-terminals (Steiner vertices) in the optimum solution. In contrast to this we give an efficient parameterized approximation scheme (EPAS), which circumvents both hardness results. Moreover, our methods imply the existence of a polynomial size approximate kernelization scheme (PSAKS) for the assumed parameter. We further study the parameterized approximability of other variants of Steiner Tree, such as Directed Steiner Tree and Steiner Forest. For neither of these an EPAS is likely to exist for the studied parameter: for Steiner Forest an easy observation shows that the problem is APX-hard, even if the input graph contains no Steiner vertices. For Directed Steiner Tree we prove that computing a constant approximation for this parameter is W-hard. Nevertheless, we show that an EPAS exists for Unweighted Directed Steiner Tree. Also we prove that there is an EPAS and a PSAKS for Steiner Forest if in addition to the number of Steiner vertices, the number of connected components of an optimal solution is considered to be a parameter.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper we study several variants of the Steiner Tree problem. In its most basic form this optimization problem takes an undirected graph with edge weights for every , and a set of terminals as input. The non-terminals in are called Steiner vertices. A Steiner tree is a tree in the graph , which spans all terminals in and may contain some of the Steiner vertices. The objective is to minimize the total weight of the computed Steiner tree . This fundamental optimization problem is one of the 21 original -hard problems listed by Karp  in his seminal paper from 1972, and has been intensively studied since then. The Steiner Tree problem and its variants have applications in network design, circuit layouts, and phylogenetic tree reconstruction, among others (see survey ).

Two popular ways to handle the seeming intractability of -hard problems are to design approximation  and parameterized  algorithms. For the former, an -approximation is computed in polynomial time for some factor specific to the algorithm, i.e., the solution is always at most a multiplicative factor of worse than the optimum of the input instance. The Steiner Tree problem, even in its basic form as defined above, is -hard , i.e., it is -hard to obtain an approximation factor of . However a factor of can be achieved in polynomial time , which is the currently best factor known for this runtime.

For parameterized algorithms, an instance is given together with a parameter describing some property of the input. The idea is to isolate the exponential runtime of an -hard problem to the parameter. That is, the optimum solution is computed in time , where is a computable function independent of the input size . If such an algorithm exists, we call the problem fixed-parameter tractable () for parameter . Here, the choice of the parameter is crucial, and a problem may be for some parameters, but not for others. A well-studied parameter for the Steiner Tree problem is the number of terminals . It is known since the classical result of Dreyfus and Wagner  that the Steiner Tree problem is for this parameter, as the problem can be solved in time  if . A more recent algorithm by Fuchs et al.  obtains runtime for any constant . This can be improved to if the input graph is unweighted using the results of Björklund et al. . A somewhat complementary and less-studied parameter to the number of terminals is the number of Steiner vertices in the optimum solution, i.e., if is an optimum Steiner tree. It is known  that Steiner Tree is -hard for parameter and therefore is unlikely to be , in contrast to the parameter . This parameter has been mainly studied in the context of unweighted graphs before. The problem remains -hard in this special case and therefore the focus has been on designing parameterized algorithms for restricted graph classes, such as planar or -degenerate graphs [25, 31].

In contrast to this, our question is: what can be done in the most general case, in which the class of input graphs is unrestricted and edges may have weights? Our main result is that we can overcome the -hardness of Steiner Tree on one hand, and on the other hand also the -hardness for our parameter of choice , by combining the two paradigms of approximation and parametrization.111This area has recently received growing interest (c.f. the Parameterized Approximation Algorithms Workshop) We show that there is an efficient parameterized approximation scheme (), which for any computes a -approximation in time for a function  independent of . Note that here we consider the approximation factor of the algorithm as a parameter as well, which accounts for the “efficiency” of the approximation scheme (analogous to an efficient polynomial time approximation scheme or ). In fact, as summarized in the following theorem, our algorithm computes an approximation to the cheapest tree having at most Steiner vertices, even if better solutions with more Steiner vertices exist.

###### Theorem 1.

There is an algorithm for Steiner Tree, which given an edge-weighted undirected graph , terminal set , , and integer , computes a -approximation to the cheapest Steiner tree with in time . 222If the input to this optimization problem is malformed (e.g., if is smaller than the number of Steiner vertices of any feasible solution) then the output of the algorithm can be arbitrary (cf. )

Many variants of the Steiner Tree problem exist, and we explore the applicability of our techniques to some common ones. For the Directed Steiner Tree problem the aim is to compute an arborescence, i.e., a directed graph obtained by orienting the edges of a tree so that exactly one vertex called the root has in-degree zero (which means that all vertices are reachable from the root). More concretely, the input consists of a directed graph with arc weights for every , a terminal set , and a specified terminal . A Steiner arborescence is an arborescence in with root containing all terminals . The objective is to find a Steiner arborescence minimizing the weight . This problem is notoriously hard to approximate: no -approximation exists unless  . But even for the Unweighted Directed Steiner Tree problem in which each arc has unit weight, a fairly simple reduction from the Set Cover problem implies that no -approximation algorithm is possible unless  [21, 13]. At the same time, even Unweighted Directed Steiner Tree is -hard for our considered parameter  [28, 25], just as for the undirected case. For this reason, all previous results have focused on restricted inputs: Jones et al.  prove that when combining the parameter with the size of the largest excluded topological minor of the input graph, Unweighted Directed Steiner Tree is . They also show that if the input graph is acyclic and -degenerate, the problem is for the combined parameter and .

Our focus again is on general unrestricted inputs. We are able to leverage our techniques to the unweighted directed setting, and obtain an , as summarized in the following theorem. Here the cost of a Steiner arborescence is the number of contained arcs.

###### Theorem 2.

There is an algorithm for Unweighted Directed Steiner Tree, which given an unweighted directed graph , terminal set , root , , and integer , computes a -approximation to the cheapest Steiner arborescence with in time . 22footnotemark: 2

Can our techniques be utilized for the even more general case when arcs have weights? Interestingly, in contrast to the above theorem we can show that in general the Directed Steiner Tree problem most likely does not admit such approximation schemes, even when allowing “non-efficient” runtimes of the form for any computable functions and . This follows from the next theorem, since setting to any constant, the existence of such a -approximation algorithm would imply .

###### Theorem 3.

For any computable function , it is -hard to compute an -approximation of the optimum Steiner arborescence for Directed Steiner Tree parameterized by , if the input graph is arc-weighted.

Another variant of Steiner Tree is the Node Weighted Steiner Tree problem, in which the Steiner vertices have weights, instead of the edges. The aim is to minimize the total weight of the Steiner vertices in the computed solution. A similar reduction as the one used to prove Theorem 3 (from Dominating Set) shows that also in this case computing any -approximation is -hard, even if all Steiner vertices have unit weight.

Other common variants of Steiner Tree include the Prize Collecting Steiner Tree and Steiner Forest problems. The latter takes as input an edge-weighted undirected graph and a list of terminal pairs, i.e., . A Steiner forest is a forest in for which each pair is in the same connected component, and the objective is to minimize the total weight of the forest . For this variant it is not hard to see that parametrizing by cannot yield any approximation scheme, as a simple reduction from Steiner Tree shows that the problem is -hard even if the input has no Steiner vertices (see Section 2.1). For the Prize Collecting Steiner Tree problem, the input is again a terminal set in an edge-weighted graph, but the terminals have additional costs. A solution tree is allowed to leave out a terminal but has to pay its cost in return (cf. ). It is also not hard to see that this problem is -hard, even if there are no Steiner vertices at all.

These simple results show that our techniques to obtain approximation schemes reach their limit quite soon: with the exception of Unweighted Directed Steiner Tree, most common variants of Steiner Tree seem not to admit approximation schemes for our parameter . We are however able to generalize our to Steiner Forest if we combine with the number of connected components in the optimum solution. In fact, our main result of Theorem 1 is a corollary of the next theorem, using only the first part of the above mentioned reduction from Steiner Tree (cf. Section 2.1). Due to this, it is not possible to have a parameterized approximation scheme for the parameter alone, as such an algorithm would imply a polynomial time approximation scheme for the -hard Steiner Tree problem. Hence the following result necessarily needs to combine the parameters and .

###### Theorem 4.

There is an algorithm for Steiner Forest, which given an edge-weighted undirected graph , a list of terminal pairs, , and integers , computes a -approximation to the cheapest Steiner forest with at most connected components and where , in time . 22footnotemark: 2

A topic tightly connected to parameterized algorithms is kernelization. We here use the framework of Lokshtanov et al. , who also give a thorough introduction to the topic (see Section 2.2 for formal definitions). Loosely speaking, a kernelization algorithm runs in polynomial time, and, given an instance of a parameterized problem, computes another instance of the same problem, such that the size of the latter instance is at most for some computable function in the parameter of the input instance. The computed instance is called the kernel, and for an optimization problem it must be possible to efficiently convert an optimum solution to the kernel into an optimum solution to the input instance.

A fundamental result of parameterized complexity says that a problem is if and only if it has a kernelization algorithm . This means that for our parameter , most likely Steiner Tree does not have a kernelization algorithm, as it is -hard. For this reason, the focus of kernelization results have previously again shifted to special cases. By a folklore result, Steiner Tree is for our parameter if the input graph is planar (cf. ). Of particular interest are polynomial kernels, which have size polynomial in the input parameter. The idea is that computing the kernel in this case is an efficient preprocessing procedure for the problem, such that exhaustive search algorithms can be used on the kernel. Suchý  proved that Unweighted Steiner Tree parameterized by admits a polynomial kernel if the input graph is planar.

Our aspirations again are to obtain results for inputs that are as general as possible, i.e., on unrestricted edge-weighted input graphs. We prove that Steiner Tree has a polynomial lossy (approximate) kernel, despite the fact that the problem is -hard: an -approximate kernelization algorithm is a kernelization algorithm that computes a new instance for which a given -approximation can be converted into an -approximation for the input instance in polynomial time. The new instance is now called a (polynomial) approximate kernel, and its size is again bounded as a function (a polynomial) of the parameter of the input instance.

Just as for our parameterized approximation schemes in Theorems 4 and 1, we prove the existence of a lossy kernel for Steiner Tree by a generalization to Steiner Forest where we combine the parameter with the number of connected components in the optimum solution. Also, our lossy kernel can approximate the optimum arbitrarily well: we prove that for our parameter the Steiner Forest problem admits a polynomial size approximate kernelization scheme (), i.e., for every there is a -approximate kernelization algorithm that computes a polynomial approximate kernel. An easy corollary then is that Steiner Tree parameterized only by  also has a , by setting in Theorem 5 and using the above mentioned reduction from Steiner Tree to Steiner Forest (cf. Section 2.1).

###### Theorem 5.

There is a -approximate kernelization algorithm for Steiner Forest, which given an edge-weighted undirected graph , a list of terminal pairs, and integers , computes an approximate kernel of size , if for the optimum Steiner forest , where , the number of connected components of is at most , and . 22footnotemark: 2

Analogous to approximation schemes, it is possible to distinguish between efficient and non-efficient kernelization schemes: a is size efficient if the size of the approximate kernel is bounded by , where is the parameter and is a computable function independent of . Our bound on the approximate kernel size in Theorem 5 implies that we do not obtain a size efficient for either Steiner Forest or Steiner Tree. This is in contrast to the existence of efficient approximation schemes for the same parameters in Theorems 4 and 1. We leave open whether a size efficient can be found in either case. Interestingly, we also do not obtain any for the Unweighted Directed Steiner Tree problem, even though by Theorem 2 an exists. In fact this is not surprising given the following theorem.

###### Theorem 6.

No -approximate kernelization algorithm exists for Unweighted Directed Steiner Tree parameterized by the number of Steiner vertices in the optimum Steiner arborescence for any , unless .

### 1.1 Used techniques

Our algorithms are based on the intuition that a Steiner tree containing only few Steiner vertices but many terminals must either contain a large component induced by terminals, or a Steiner vertex with many terminal neighbors forming a large star. A high-level description of our algorithms for Unweighted Directed Steiner Tree and Steiner Forest therefore is as follows. In each step a tree is found in the graph in polynomial time, which connects some terminals using few Steiner vertices. We save this tree as part of the approximate solution and then contract it in the graph. The vertex resulting from the contraction is declared a terminal and the process repeats for the new graph. Previous results [25, 31] have also built on this straightforward procedure in order to obtain algorithms and polynomial kernels for special cases of Unweighted Directed Steiner Tree and Unweighted Steiner Tree. In particular, in the unweighted undirected setting it is a well-known fact (cf. ) that contracting an adjacent pair of terminals is always a safe option, as there always exists an optimum Steiner tree containing this edge. However this immediately breaks down if the input graph is edge-weighted, as an edge between terminals might be very costly and should therefore not be contained in any (approximate) solution.

Instead we employ more subtle contraction rules, which use the following intuition. Every time we contract a tree with terminals we decrease the number of terminals by (as the vertex arising from a contraction is a terminal). Our ultimate goal would be to reduce the number of terminals to one—at this point, the edges that we contracted during the whole run connect all the terminals. Decreasing the number of terminals by one can therefore be seen as a “unit of work”. We will pick a tree with the lowest cost per unit of work done, and prove that as long as there are sufficiently many terminals left in the graph, these contractions only lose an -factor compared to the optimum. As soon as the number of terminals falls below a certain threshold depending on the given parameter, we can use an algorithm computing the optimum solution in the remaining graph. This algorithm is parametrized by the number of terminals, which now is bounded by our parameter. For the variants of Steiner Tree considered in our positive results, such algorithms can easily be obtained from the ones for Steiner Tree [16, 3, 19]. Adding this exact solution to the previously contracted trees gives a feasible solution that is a -approximation.

Each step in which a tree is contracted in the graph can be seen as a reduction rule as used for kernelization algorithms. Typically, a proof for a kernelization algorithm will define a set of reduction rules and then show that the instance resulting from applying the rules exhaustively has size bounded as a function in the parameter. To obtain an -approximate kernelization algorithm, additionally it is shown that each reduction rule is -safe. Roughly speaking, this means that at most a factor of is lost when applying any number of -safe reduction rules (see Section 2.2 for formal definitions).

Contracting edges in a directed graph may introduce new paths, which did not exist before. Therefore, for the Unweighted Directed Steiner Tree problem, we need to carefully choose the arborescence to contract. In order to prove Theorem 2 we show that each contraction is a -safe reduction rule. However, the total size of the graph resulting from exhaustively applying the contractions is not necessarily bounded as a function of our parameter. Thus we do not obtain an approximate kernel.

For Steiner Forest the situation is in a sense the opposite. Choosing a tree to contract follows a fairly simple rule. On the downside however, the contractions we perform are not necessarily -safe reduction rules. In fact there are examples in which a single contraction will lose a large factor compared to the optimum cost. We are still able to show however, that after performing all contractions exhaustively, any -approximation to the resulting instance can be converted into a -approximation to the original input instance. Even though the total size of the resulting instance again cannot be bounded in terms of our parameter, for Steiner Forest we can go on to obtain a . For this we utilize a result of Lokshtanov et al. , which shows how to obtain a for Steiner Tree if the parameter is the number of terminals. This result can be extended to Steiner Forest, and since our instance has a number of terminals bounded in our parameter after applying all contractions, we obtain Theorem 5.

To obtain our inapproximability result of Theorem 3, we use a reduction from the Dominating Set problem. It was recently shown by Srikanta et al.  that this problem does not admit parameterized -approximation algorithms for any function , if the parameter is the solution size, unless . We are able to exploit this to also show that no such algorithm exists for Directed Steiner Tree with edge weights, under the same assumption. To prove Theorem 6 we use a cross composition from the Set Cover problem, for which Dinur and Steurer  proved that it is -hard to compute a -approximation. We are able to preserve only a constant gap, thus we leave open whether stronger non-constant lower bounds are obtainable, or whether Unweighted Directed Steiner Tree has a polynomial size -approximate kernel for some constant .

### 1.2 Related work

As the Steiner Tree problem and its variants have been studied since decades, the literature on this topic is huge. We only present a selection of related work here, that was not yet mentioned above.

For general input graphs, Zelikovsky  gave the first polynomial time approximation algorithm for Steiner Tree with a better ratio than (which can easily be obtained by computing an MST on the terminal set). His algorithm is based on repeatedly contracting stars with three terminals each, in the metric closure of the graph, which yields a -approximation. This line of work led to the Borchers and Du  Theorem, which states that for every Steiner Tree instance with terminal set and every there exists a set of sub-trees (so-called full components) on at most terminals from each, such that their union forms a Steiner tree for of cost at most times the optimum. As a consequence, it is possible to compute all full components with at most terminals (using an algorithm parametrized by the number of terminals [16, 19]), and then find a subset of the precomputed solutions, in order to approximate the optimum. This method is the basis of most modern Steiner Tree approximation algorithms, and is for instance leveraged in the currently best -approximation algorithm of Byrka et al. . The Borchers and Du  Theorem can also be interpreted in terms of approximate kernelization schemes, as Lokshtanov et al.  point out (cf. proof of Theorem 5). It is interesting to note that our algorithms are also based on finding good sub-trees. However, while computing optimum full components is NP-hard, the sub-trees we compute in each step can be found in polynomial time, regardless of how many terminals they contain.

For planar graphs  it was shown that an exists for Steiner Tree. For Steiner Forest a -approximation can be computed in polynomial time on general inputs , but an also exists if the input is planar . If the Unweighted Steiner Tree problem is parametrized by the solution size, it is known  that no polynomial (exact) kernel exists, unless . If the input is restricted to planar or bounded-genus graphs it was shown that polynomial kernels do exist for this parametrization . It was later shown  that for planar graphs this is even true for our smaller parameter .

For the Directed Steiner Tree problem it is a long standing open problem whether a polylogarithmic approximation can be computed in polynomial time. It is known that an -approximation can be computed in polynomial time , and an -approximation in quasi-polynomial time . Feldmann and Marx  consider the Directed Steiner Network problem, which is the directed variant of Steiner Forest (i.e. a generalization of Directed Steiner Tree). They give a dichotomy result, proving that the problem parameterized by is whenever the terminal pairs induce a graph that is a caterpillar with a constant number of additional edges, and otherwise the problem is -hard. Among the -hard cases is the Strongly Connected Steiner Subgraph problem (for which the hardness was originally established by Guo et al. ), in which all terminals need to be strongly connected. For this problem a -approximation is obtainable  when parametrizing by , and a recent result shows that this is best possible  under the Gap Exponential Time Hypothesis.

In the same paper, Chitnis et al.  also consider the Bidirected Steiner Network problem, which is the directed variant of Steiner Forest on bidirected input graphs, i.e., directed graphs in which for every edge the reverse edge exists as well and has the same cost. These graphs model inputs that lie between the undirected and directed settings. From Theorems 5 and 1, it is not hard to see that the Bidirected Steiner Tree problem (i.e. Directed Steiner Tree on bidirected inputs) has both an and a for our parameter , by reducing the problem to the undirected setting. Since the for parameter follows from the for parameter  given by Lokshtanov et al. , it is interesting to note that for parameter , Chitnis et al.  provide both a and a parameterized approximation scheme for the Bidirected Steiner Network problem whenever the optimum solution is planar. This is achieved by generalizing the Borchers and Du  Theorem to this setting. As this is a generalization of Bidirected Steiner Tree, it is natural to ask whether corresponding algorithms also exist for our parameter in the more general setting considered in .

## 2 Preliminaries

### 2.1 Reducing Steiner tree to Steiner forest

By a folklore result, we may reduce the Steiner Tree problem to Steiner Forest. For this we pick an arbitrary terminal  of the Steiner Tree instance, and for every other terminal  of this instance, introduce a terminal pair for Steiner Forest.

If we want to construct an instance without Steiner vertices, we can add a new vertex for every Steiner vertex of Steiner Tree and add an edge of cost . Additionally we introduce a terminal pair to our Steiner Forest instance. Hence in the constructed Steiner Forest instance (i.e., there are no Steiner vertices), but an optimum Steiner forest in the constructed graph costs exactly as much as an optimum Steiner tree in the original graph. As Steiner Tree is -hard, the same is true for Steiner Forest, even if all vertices are terminals.

### 2.2 Lossy kernels

We give a brief introduction to the lossy kernel framework as introduced by Lokshtanov et al. . See the latter reference for a thorough introduction to the topic.

For an optimization problem, a polynomial time pre-processing algorithm is a pair of polynomial time algorithms: the reduction algorithm and the solution lifting algorithm . The former takes an instance  with parameter of a given problem as input, and outputs another instance with parameter . The solution lifting algorithm converts a solution for the instance to a solution of the input instance : given a solution to , computes a solution for , such that is optimal for if is optimal for . If additionally the output of is bounded as a function of , i.e., when for some computable function independent of , then the pair given by and is called a kernelization algorithm, and together with parameter is the kernel. If the reduction and solution lifting algorithms get an input that is not an instance of the problem (for example if the parameter does not correctly describe some property of the optimum solution), then the outputs of the algorithms are undefined and can be arbitrary.

An -approximate polynomial time pre-processing algorithm is again a pair of a reduction algorithm  and a solution lifting algorithm , both running in time polynomial in the input size. The reduction and solution lifting algorithms are as before, but there is a different property on the output of the latter: if the given solution to the instance computed by is a -approximation, then the output of is a solution that is an -approximation for the original instance . Analogous to before, an -approximate kernelization algorithm is an -approximate polynomial time pre-processing algorithm for which the size of the output of the reduction algorithm is bounded in terms of only. The output of is in this case called an approximate kernel, and it is polynomial if its size is bounded by a polynomial in .

In the context of lossy kernels a reduction rule is a reduction algorithm . It is called -safe if a solution lifting algorithm exists, which together with form a strict -approximate polynomial time pre-precessing algorithm. This means that if is a -approximation for the instance computed by , then computes a -approximation for the input instance. As shown in , the advantage of considering this stricter definition is that, as usual, reduction rules can be applied exhaustively, until a stable point is reached in which none of the rules would change the instance any longer: the algorithm resulting from applying these rules together with their corresponding solution lifting algorithms, forms a strict -approximate polynomial time pre-precessing algorithm (which is not necessarily the case when using the non-strict definition; see ).

## 3 The weighted undirected Steiner forest and Steiner tree problems

In this section we describe an approximate polynomial time preprocessing algorithm that returns an instance of Steiner Forest containing at most terminals if there is a Steiner forest with at most Steiner vertices and at most connected components. We can use this algorithm in two ways. Either we can proceed with a kernelization derived from Lokshtanov et al.  and obtain a polynomial size lossy kernel (Theorem 5), or we can run an exact algorithm derived from Fuchs et al.  on the reduced instance, obtaining an running in single exponential time with respect to the parameters (Theorems 1 and 4). In both cases we use the combined parameter .

Steiner Forest
Input: A graph , with edge weights for each , and a list of pairs of terminals.
Solution: A Steiner forest containing an - path for every

We first rescale all weights so that every edge has weight strictly greater than . Then, in each step of our algorithm we pick a star, add it to the solution, and contract the star in the current graph. We repeat this procedure until the number of terminals falls below a specified bound depending on , , and . To describe how we pick the star to be contracted in each step, we need to introduce the ratio of a star. Let be a set of edges of a star, i.e., all edges of are incident to a common vertex which is the center of the star, and denote by the set of terminals incident to . Provided , we define the ratio of as , where . Note that we allow to contain only a single edge if it joins two terminals, and that due to rescaling of edge weights each star has ratio strictly greater than

. Observe also that the ratio of a star is similar to the average weight of a star. However the ratio is skewed due to the subtraction of

in the denominator. In particular, for two stars of the same average weight, the one with more terminals will have the smaller ratio. Thus, in this sense, picking a star with small ratio favours large stars.

In every step, our algorithm contracts a star with the best available ratio (i.e., the lowest ratio among all stars connecting at least two terminals). Due to the following lemma, a star with the best ratio has a simple form: it consists of the cheapest edges incident to its center vertex and some terminal. As there are possible center vertices and at most incident edges to each center which can be sorted in time , the best ratio star can be found in time . Later we show that there is a star with at least two terminals in every step, provided that the number of terminals is more that .

###### Lemma .

Let be a vertex and denote by the terminals adjacent to , where , i.e., the terminals are ordered non-decreasingly by the weight of the corresponding edge . The star with the best ratio having as its center has edge set for some .

###### Proof.

Let be an edge set of a star with center vertex . First note that if this star contains a Steiner vertex as a leaf, can be removed from in order to decrease the ratio , since only the terminals of the star are counted in the denominator. Also if does not contain some edge but an edge with , then we may switch the edge for in in order to optimize the ratio: the denominator stays the same, but the numerator cannot increase, as the terminals are ordered non-decreasingly according to the weights . ∎

To analyse our algorithm we need to keep track of the different graphs resulting from each contraction step . Initially we set to the input graph, and in each step we obtain a new graph from by contracting a set of edges in , such that forms a star of minimum ratio in . That is, we obtain from by identifying all vertices incident to edges in , removing all resulting loops, and among the resulting parallel edges we delete all but the lightest one with respect to their weights. We also adjust the terminal pairs in a natural way: let be the vertex of resulting from contracting . If had a terminal pair such that is incident to some edge of while is not, then we introduce the terminal pair for . Also every terminal pair of for which neither nor  is incident to any edge of is introduced as a terminal pair of . Any terminal pair for which both and are incident to edges of is going to be connected by a path in the computed solution, as it will contain . Hence, such a terminal pair can be safely removed.

The algorithm stops contracting best-ratio stars when there are less than  terminals left; the exact value of depends on , , and the desired approximation factor, it will satisfy and we specify it later. If the algorithm stops in step , the solution lifting algorithm takes a feasible solution of and returns the union of and . Such a solution is clearly feasible, since we adapted the terminal pairs accordingly after each contraction.

For the purpose of analysis, we consider a solution in the current graph that originates from a solution of the original instance , but may contain edges that are heavier than those in . More concretely, denote by a solution in with at most Steiner vertices and at most components, i.e., is a Steiner forest containing every - path. We remark that may or may not be an optimum solution of . Given for , we modify this solution to obtain a new feasible solution on the terminal pairs of . Note that the edges of the contracted star might not be part of . We still mimic the contraction of the star in : to obtain from , we identify all leaves of (which are terminals by Section 3 and thus part of the solution ) and possibly also the center of if it is in . This results in a vertex . We now want to delete edges incident to in such a way that we are left with an acyclic feasible solution. If we delete an inclusion-wise minimal feedback edge set, we clearly get a feasible solution. Let denote the set of terminals incident to . We choose a feedback edge set for which every edge was incident to a vertex of before the contraction in , i.e., an edge of corresponding to an edge of never connects two Steiner vertices. Note that such an inclusion-wise minimal feedback edge set always exists: if we delete all edges of incident to except and then contract , we get an acyclic graph. See Fig. 1 for an illustration.

The resulting graph is , which now forms a forest connecting all terminal pairs of . Note that for each edge in there is a corresponding edge in , which however may be lighter in , as from each bundle of parallel edges in we keep the lightest one, but this edge may not exist in .

We now observe that there is always a star with at least two terminals and thus the algorithm always selects some star.

###### Lemma .

Provided that there are at least terminals in , there is a star with a least two terminals in .

###### Proof.

Note that it is sufficient to find such a star in as edges in are also present in (even if their weight may be smaller). If there is an edge between two terminals in , then we are done as itself is a star. Otherwise, all terminals are incident to Steiner vertices only. Thus there must be a Steiner vertex incident to at least two terminals in , since contains at most Steiner vertices but more than terminals. ∎

To show that our algorithm only loses an -factor compared to the cost of the solution , we will compare the cost of the edges contracted by our algorithm to the set of deleted edges of . Note that there are at least edges in , since we contracted terminals in the forest with at most connected components to obtain , and a forest on vertices and components has edges. We decrease the number of vertices of by at least (one more if the center of the star with edge set was a Steiner vertex present in ), and we decrease the number of components by at most . Note also that for any two time steps , the sets and , but also the sets and , are disjoint. Thus if for every , then our algorithm computes a -approximation. Unfortunately, this is not always the case: there are contractions for which this condition does not hold (see Fig. 2) and we have to account for them differently.

###### Definition .

If we say that the contracted edge set in step  is good; otherwise is bad. Moreover, if has strictly more components than , we say that is multiple-component, otherwise it is single-component.

Our goal is to show that the total weight of bad contractions is bounded by an -fraction of the weight of . We start by proving that if the set of terminals in is sufficiently large, then the contraction is good. Intuitively, this means that skewing the ratio such that large stars are favoured (compared to just picking the star with smallest average weight) tends to result in good contractions. We define

 λ:=(1+ε)(p+c)ε.
###### Lemma .

If , then the contracted edge set is good.

###### Proof.

For brevity, we drop the index . Let be the ratio of the contracted star, and let be the number of deleted edges in that connect two terminals. Note that any such edge has weight at least , since it spans a star with two terminals, which has ratio equal to its weight, and since each edge in (of which is a subset) can only be heavier than the corresponding edge in the current graph .

Let be the Steiner vertices adjacent to edges in , and let be the number of edges in incident to one such Steiner vertex (see Fig. 3). Since is a feedback edge set in which any edge was incident to a terminal in before the contraction, there is no edge in which connects two Steiner vertices. Consider the star spanned by the edges of incident to . If , the ratio of this star is at least , since its edges are at least as heavy as the corresponding edges in and the algorithm chose a star with the minimum ratio in . Thus, the weight of edges in incident to is at least . In the case where , the lower bound on the weight holds trivially.

Any edge in not incident to any Steiner vertex connects two terminals. Therefore, we have as any edge in is incident to a terminal in and we thus do not count any edge twice. Also recall that . Since contains at most Steiner vertices we have , and we obtain

 w(D)≥rℓ′+q∑i=1r(ℓi−1)=r(ℓ′+q∑i=1ℓi−q)≥r(|Q|−p−c).

Finally, using we bound by as follows:

 (1+ε)w(D)≥(1+ε)r(|Q|−p−c)=r|Q|+r(ε|Q|−(1+ε)(p+c))≥w(C)+r(ε(1+ε)(p+c)ε−(1+ε)(p+c))=w(C).\qed

Note that there may be a lot of contractions with . However, we show that only a bounded number of them is actually bad. The key idea is to consider contractions with ratio in an interval for some and integer . Due to the rescaling of weights every star belongs to an interval with . The following crucial lemma of our analysis shows that the number of bad single-component contractions in each such interval is bounded in terms of and , if is a function of . In particular, let , so that . We call an edge set with ratio in the -th interval, i.e., with , an -contraction, and define

 κ:=(1+δ)pδ+p.
###### Lemma .

For any integer the number of bad single-component -contractions is at most .

###### Proof.

Let us focus on bad single-component -contractions only which we just call bad -contractions for brevity. Suppose for a contradiction that the number of bad -contractions is larger than . Let be the first step with a bad -contraction, i.e., is the minimum among all for which and and the contraction is single-component. The plan is to show that at step there is a “light” star in with ratio at most and consequently the algorithm would do a -contraction for some . This leads to a contradiction, since we assumed that in step the contraction has ratio in interval . Note that it is sufficient to find such a light star in as for each edge in there is an edge in the graph between the same vertices of the same weight or even lighter.

We claim that for each step in which the algorithm does a bad -contraction there is an edge with weight at most . We have as is bad and as the ratio of is in interval . Putting it together and using the definition of we obtain

 w(Dt)<(1+δ)i+11+ε(|Qt|−1)=(1+δ)i−1(|Qt|−1).

Because is single-component, we have and therefore there is an edge with weight at most , which proves the claim.

Note that the edge also exists at time step , as and is obtained from by a sequence of edge contractions and deletions. At time it cannot be that connects two terminals, since we assume that the algorithm picked a star of ratio more than in step  (recall that each edge connecting two terminals is a star with ratio equal to its weight). It may happen though that connects two Steiner vertices in step . We discard any such edge that connects two Steiner vertices in step . That is, let be the set of light edges  that lead between a Steiner vertex and a terminal in step . Note that edges and for steps with bad -contractions are distinct, because as all edges in are deleted from . There are at most edges connecting two Steiner vertices in , since is a forest and the solution from which is derived contained at most Steiner vertices. As we assume that there are more than bad single-component -contractions, we have .

At step there must be a Steiner vertex in incident to at least edges in . Consider a star with as the center and with edges from that are incident to ; we have . The ratio of this star is at most . Since (by a routine calculation) we get that the ratio of is at most which is a contradiction to the assumption that the algorithm does an -contraction in step . ∎

We also need a bound on number of bad multiple-component edge sets.

###### Lemma .

The number of steps in which a bad multiple-component edge set is contracted is at most .

###### Proof.

If is a bad multiple-component edge set, must have at least one component fewer than . Since has at most components, the bound follows. ∎

We remark that the proofs of Section 3 and 3 do not use that the number of terminals in a bad -contraction is bounded by , as shown in Section 3. Instead we bound the total weight of bad contractions in terms of . For this let be the largest interval of any contraction during the whole run of the algorithm, i.e., the ratio of every contracted star is at most . As there are at most bad single-component contractions in each interval and bad multiple-component contractions and the interval size grows exponentially, we can upper bound the total weight of bad contractions in terms of and . We can also lower bound the weight of in terms of and the lower bound on the number of terminals in the graph. If is large enough then the total weight of edge sets of bad contractions is at most . These ideas are summarized in the next lemma.

###### Lemma .

Let be the largest interval of any contraction during the whole run of the algorithm and let be the total weight of edge sets of bad contractions. Then, the following holds.

1. .

2. Let

 τ:=(κ+c)⋅λ⋅(1+δ)2εδ+2p+c.

Then .

###### Proof.

(1) By Section 3, there are less than bad multiple-component contractions. Each of them has at most terminals by Section 3 and has ratio at most by the choice of . Thus, the total weight of all bad multiple-component contractions can be bounded by .

Note that it follows from Sections 3 and 3 that the total weight of bad single-component -contractions is at most . The bound on the total weight of bad contractions follows by summing over all intervals in which the algorithm does a contraction:

 κ⋅λ⋅∑i≤j(1+δ)i+1+c⋅λ⋅(1+δ)j=κ⋅λ⋅(1+δ)j+2−1(1+δ)−1+c⋅λ⋅(1+δ)j≤(κ+c)⋅λ⋅(1+δ)j+2δ.

This proves (1).

(2) When our algorithm contracted a star having ratio in the largest interval in some step , all stars in with Steiner vertices as centers had ratios at least . Thus if is the number of terminals incident to in , then these terminals together with form a star of weight at least . Similarly, all edges between terminals in have weight at least ; let be the number of such edges.

Since there are at least terminals in step (otherwise the algorithm would have terminated), and at most of edges in connect two Steiner vertices, we have as . The total weight of edges in is thus at least

 ℓ′r+q∑i=1r⋅(ℓi−1)≥r⋅(τ−2p−c)≥(1+δ)j⋅(τ−2p−c).

This shows (2) as .

(3) By (2) and using the value of we have

 ε⋅w(F∗0)≥ε(1+δ)j⋅(τ−2p−c)≥ε(1+δ)j⋅(κ+c)⋅λ⋅(1+δ)2εδ=(κ+c)⋅λ⋅(1+δ)j+2δ,

which is the upper bound on the total weight of bad edges sets by (1). Thus (3) holds as well. ∎

The above lemma can now be used to prove that all the contractions put together (by scaling ) form a -approximate pre-processing procedure with respect to (cf. Section 2.2).

###### Lemma .

The algorithm outputs an instance with terminals and (together with the solution lifting algorithm) it is a -approximate polynomial time pre-processing algorithm with respect to .

###### Proof.

By Section 3 each step of the algorithm can be executed since . Thus the upper bound on the number of terminals follows directly from the description of the algorithm. To bound the running time, we already noted that finding a minimum ratio star to contract can be done in time. Since such a star with at least two vertices is contracted in each step to form the next graph , the total time used for contractions until only terminals are left is polynomial in .

Let us focus on the -approximate part. Let be the graph left after the last contraction step , and let be a Steiner forest for the remaining terminal pairs. The solution lifting algorithm simply adds all contracted edge sets to in order to compute a Steiner forest in the input graph . We need to show that, if is a -approximation to the optimum in , the resulting forest is a -approximation to the optimum of .

Let us call a step of the algorithm good (bad) if the corresponding contracted edge set is good (bad). As all sets are disjoint, using Sections 3 and 3 the weight of can be bounded by