1 Introduction
One of the most classic problems in network design is arguably the Steiner Tree problem. Given a weighted graph and a set of terminals, one has to compute the cheapest tree that connects all the terminals. A straightforward generalization of this problem is the socalled Steiner Forest problem in which one is given a set of pairs of vertices that are required to be connected. One has to buy the cheapest forest that connects all the pairs of terminals. In a seminal paper, Imase and Waxman [imase1991dynamic] introduced the online version of the Steiner Tree problem. In this version, the terminals in are revealed one by one (in a possibly adversarial order), and the algorithm has to connect all previously arrived terminals before seeing the next one. The challenge lies in the fact that the algorithm is not allowed to remove edges that were bought before. Imase and Waxman provided tight bounds in this scenario by proving that (1) the natural greedy algorithm which simply connects the latest arrived terminal to the closest previously arrived terminal is competitive and (2) no algorithm can be better than competitive. Shortly after, Westbrook and Yan [westbrook1995performance] introduced the online Steiner Forest problem. In this variation, the pairs in are revealed one after another (again in a possibly adversarial order). At any point in time, the algorithm has to maintain a feasible solution to the instance made of pairs that arrived so far without discarding previously selected edges. As it is a more general problem, the negative result of Imase and Waxman shows that no algorithm can be better than competitive. From now on, with a slight abuse of notation, we will use to denote either the number of terminals in the online Steiner Tree problem or the number of pairs in the online Steiner Forest problem. A natural generalization of the greedy algorithm of Imase and Waxman for online Steiner Tree to the case of Steiner Forest can be described informally as follows:
Upon the arrival of a new pair , connect and with the shortest path in the current metric, contract the metric along the chosen path and wait for the next pair.
We mention that there are some subtleties about how the metric is contracted exactly, but for the sake of clarity, we will postpone these details to later in the introduction. The reader might think for now that greedy contracts the edges that it selects (i.e. for any edge selected by greedy, its weight is set to 0). For now, an algorithm will be considered as “greedy” if it always buys the shortest path in the current metric and nothing else. Westbrook and Yan [westbrook1995performance] showed that a wide class of greedy algorithms are competitive. This bound was quickly improved by Awerbuch, Azar, and Bartal [SODA96] who showed with an elegant dual fitting argument that greedy algorithms are in fact competitive and conjectured that the right bound should be . Since then, their conjecture has remained open. To the best of our knowledge, all lower bounds for greedy that appeared so far in the literature (see [imase1991dynamic, alon1992line, SODA96, chen2010designing]) are instances where the underlying optimum forest is a single tree. Surprisingly, even in this case, nothing better than the upper bound is known. We note that the competitive analysis of Imase and Waxman for online Steiner Tree does not extend to the online Steiner Forest problem, even if we additionally assume that the offline optimum is a single tree. Indeed, the solution constructed by greedy may not be a single connected component even if the offline optimum is. This problem is highlighted by the lower bounds that appear in [SODA96, chen2010designing], which show the limitations of the current analysis techniques. We defer a more detailed discussion of this phenomenon to Appendix A.1.
Previous work on Steiner Forest.
Apart from the two results [SODA96, westbrook1995performance] mentioned above, many papers are related to this problem. The competitive ratio of greedy was first mentioned again in a list of open problems by Fiat and Woeginger [fiat1998online]. Around the same time, Berman and Coulston [berman1997line] designed a more complex (nongreedy) algorithm which they showed to be competitive. In a nutshell, their algorithm constructs a dual solution in an online manner and uses this dual solution to guide the algorithm on which edges to buy. However, their algorithm is not greedy because it can buy additional edges that could be helpful in the future but are useless right now. Later, Chen, Roughgarden, and Valiant [chen2010designing] applied the result of Awerbuch et al. regarding the greedy algorithm to design network protocols for good equilibrium. Interestingly, they mention that the nongreedy algorithm of Berman and Coulston was not possible to apply in their setting. Umboh [umboh2014online] introduced a framework based upon tree embeddings to obtain new results in online network design but also to obtain new proofs of known results. For instance, he obtains new proofs that greedy is competitive in the case of online Steiner Tree and that the BermanCoulston algorithm is competitive in the case of online Steiner Forest. However, his framework does not imply any improved bound for greedy algorithms in online Steiner Forest. More recently, the performance of greedy algorithms for online Steiner Forest was further raised as an “important open problem” in [dehghani2018greedy] and also cited in [PanigrahiLecture].
Further related work.
Online Steiner Tree (or Forest) problems have also attracted attention in various special cases such as outerplanar graphs [matsubayashi2021non], euclidean metrics [angelopoulos2009competitiveness] or more general cases such as Steiner Tree in directed graphs [angelopoulos2007improved, angelopoulos2008near, faloutsos2002effect], nodeweighted instances [naor2011online, hajiaghayi2013online], prize collecting or degreebounded variants [qian2011logn, hajiaghayi2014near, dehghani2016online]. The fact that the competitive ratio of the greedy algorithm is still an open question is reminiscent of a similar situation for the offline Steiner Tree/Forest problems. In the offline case, it was known for a very long time that greedy (i.e. compute a minimum spanning tree of the metric completion on terminals) gives a constant factor approximation to the optimum Steiner Tree. It has only been recently proved by Gupta and Kumar [gupta2015greedy] that a greedylike algorithm also yields a constant factor approximation in the case of offline Steiner Forest.
1.1 Our results
As announced in the introduction, there are some subtleties about how the metric is contracted when running greedy. Hence we will first define three variants of greedy that contract the metric in slightly different ways. However, we emphasize that the best current upper bound for all these three variants is the upper bound of Awerbuch et al. Furthermore, these three variants behave exactly the same on the problematic examples of [SODA96, chen2010designing]. In particular, all the discussion so far applies to any of these three variants. Our main theorem will apply to all three variants of greedy, but we will be able to obtain more specialized results for some variants. We believe these specialized results will further demonstrate the interest of our main theorem. Before getting into the precise definition, it is worthwhile to mention that Gupta and Kumar [gupta2015greedy] also discussed some subtleties about contracting the metric (in the offline case) and also defined several algorithms based on that. Hence it is not the first time that altering the contraction procedure has been considered.
Definition of the greedy algorithm.
After greedy connected the latest arrived pair , it is clear that the distance between and can be set to in the current metric. This is the only property of the metric contraction that the proof of Awerbuch et al. uses to obtain a good upper bound. As long as we obtain a new graph such that , the upper bound applies. In our paper, each greedy algorithm will formally maintain a graph that accurately describes the current metric. In any case, greedy will always take the shortest path in the current metric to connect a newly arrived pair. After connecting the th pair, the set of edges will be defined as where is a set of edges of weight over vertex set that we will call shortcuts. It will be clear from definition that we will have the natural condition that . The cost incurred by greedy will always be the sum of lengths of all the shortest paths taken for connecting the pairs. We now proceed to define the three contraction rules formally.
Rule 1: When greedy connects a pair through a path , add the following shortcuts:
For all add the edge of weight .
Rule 1 is what Awerbuch et al. intended in their original paper. It can be seen as simply contracting all the edges on the path taken.
Rule 2: When greedy connects a pair add an edge with weight 0.
Rule 2 is actually how the metric is contracted in [gupta2015greedy] for their main algorithm. Rule 2 might seem much weaker than Rule 1 as fewer shortcuts are added. One can see that for any , the distance between and can only be smaller when using Rule 1 over Rule 2. Hence the cost of greedy equipped with Rule 2 is always an upper bound on the cost of greedy equipped with Rule 1. However, the proof of [SODA96] already applies in this case. Hence the greedy algorithm that uses Rule 2 is already competitive.
Rule 3: When greedy connects a pair through a path , let be the subsequence of in which we keep only the vertices that appeared in a previous pair (i.e. previously arrived terminals). Then, add the following shortcuts:
For all add the edge of weight .
Intuitively, Rule 3 is inbetween rules 1 and 2. It is also reminiscent of the contraction rule of the second algorithm in [gupta2015greedy]. Again, using this rule, it is clear that we obtain shorter paths than with Rule 2. The upper bound also applies when using Rule 3. The formal definition of follows naturally for any .
As a shorthand, we will denote by the greedy algorithm at hand. If we do not specify which contraction rule we use, it will implicitly mean that the statement that follows holds for any of our three rules.
Our main result.
We can now present our main result and discuss some of its consequences. We introduce an intuitive measure of the efficiency of . In general, it might be that the cost incurred by the pair is much smaller than . Indeed, because of additional shortcuts, it might be that the ratio is unbounded when . We will define this ratio as the contraction of pair and denote it . We have that
for all pairs , in any instance . Intuitively, a very high contraction means that did a good job at reusing edges bought before. Following this remark, one can note that all known lower bounds in [imase1991dynamic, alon1992line, SODA96, chen2010designing] have a contraction of exactly 1 for all pairs in the instance (in the case of Steiner Tree instances in [imase1991dynamic, alon1992line], one can always choose one of the two endpoints of each pair so that this is the case). This seems to confirm the intuitive reasoning that a hard instance should have most of the pairs with low contraction.
After this remark, we use the shorthand to denote the greedy algorithm at hand (using any of our three contraction rules). We denote by the cost incurred by because of pairs of contraction strictly less than (i.e. pairs with ) when running instance . Note that we do not count the cost incurred by because of pairs with contraction higher than . Furthermore, we will denote the total cost incurred by . Finally, denote by the cost of the offline optimum. Our main result is the following.
[Main theorem] Fix a sequence with for all . Let be a greedy algorithm running on instance . Then,
As mentioned, this theorem applies to greedy with any of our three contraction rules. This already implies the theorem of Awerbuch et al. as it is straightforward to see that pairs with contraction at least can make greedy pay at most . To see this, simply denote by these pairs with contraction more than and the greedy cost of the most expensive pair in . Then greedy pays for those pairs in at most while must pay at least to connect the most expensive pair in . With this observation and plugging in in our bound, we obtain an upper bound of on the competitive ratio.
A consequence of our result is that if one wants to have an lower bound for some small , it must be that the lower bound on the cost incurred by greedy comes from pairs with contraction at least . This already changes the perspective on how to obtain a stronger lower bound (if it exists) and shows that all previous lower bounds (that have contraction for all pairs) cannot give anything stronger than . The fact that the cost should come from pairs with high contraction seems quite counterintuitive, and we believe this is strong evidence that the old conjecture of Awerbuch et al. should be true. Unfortunately, it is not clear to us how to formalize such an intuition. However, this result still has a number of additional consequences that are interesting. In the following, denotes the optimum tree solution to instance (i.e. we restrict the solution to be a single connected component).
Let be the greedy algorithm using Rule 3. Then,
As mentioned in the introduction, even in the case of a single tree spanning the whole graph, nothing better than the general was known, and all lower bounds in the literature are instances where the optimum is a single component. Theorem 1.1 also makes us hopeful for another reason. Gupta and Kumar essentially showed that in the offline case, one can assume that the optimum is a single tree (at the cost of losing a constant factor). Then they showed that under this condition, their algorithm is a constant factor approximation. However, extending this proof idea to the online case does not seem straightforward. By using techniques from [gupta2015greedy] in combination with ours, we can obtain the following last result.
Let be the greedy algorithm using contraction rule 3. Denote by the cost paid by when connecting pair and define (note that we take the distance in the original graph ). If either of the two sequences or is nonincreasing then
This last result can be derived by using our main theorem, combined with the same potential function argument of [gupta2015greedy]. The assumption is that either the costs paid by or the shortest paths in the original metric are nonincreasing over time. This last result is interesting for the following reason: All lower bounds in the literature use the same strategy to fool greedy. First ask two terminals that are very far apart, greedy connects them through a path and then the endpoints of the second pair are closer but not on the path that greedy bought. Then one can repeat this process with the endpoints of the pairs that are getting closer and closer. This forces greedy to buy another path for every new pair. Crafting a difficult instance not following this behavior seems very difficult, especially in the light of the result of [gupta2015greedy]. They show that their greedylike algorithm, that connects the closest terminals which are not yet satisfied (i.e. the opposite behavior of the assumption of Theorem 1.1) is a constant factor approximation for the offline Steiner Forest.
The takeaway message of all our results combined is that an lower bound cannot be any of the following:

An instance in which all pairs have a small contraction, or

an instance where a tree is a good solution, or

an instance with nonincreasing costs or shortest paths.
However, all the known examples providing an lower bounds satisfy these conditions. We believe our results change the perspective on this problem, and it would be surprising if an lower bound exists for some .
The rest of the paper is organized as follows. In Subsection 1.2, we present an overview of our techniques that lead to new results. In Section 2 we present the proof of our main result, Theorem 1.1. In Section 3, we present our proof of Theorem 1.1 and Theorem 1.1. Finally in Section 4 we discuss the questions left open in this work.
1.2 Our Techniques
In this subsection, we give an overview of the techniques to prove Theorem 1.1. We will give the necessary technical details but keeping it rather informal. The complete and formal proof of Theorem 1.1 appears in Section 2.
Previous techniques.
Before going into our techniques, we will briefly mention the main ingredients of previous proofs. For ease of notation, we will denote by the set of terminals that appear in at least one pair of . For a terminal , denote by its mate which is the terminal that should be connected to. Note that we can assume without loss of generality that all terminals have exactly one mate, by duplicating vertices if this is not the case. For ease of presentation, we will assume until the end of the section that for each pair greedy pays exactly the distance of and in the initial graph, i.e. no previously arrived pair helps greedy in paying less for the newly arrived terminal pair . Put otherwise, we assume that the contraction of all the pairs is equal to 1.
By standard arguments, we can assume that and that for each pair of terminal greedy pays a cost that belongs to the set , with only the loss of a constant factor. Indeed, we can rescale all the edges in the graph by the factor and for the second assumption, note that by standard geometric grouping, we can assume that greedy pays a cost that belongs to the set . It is then straightforward to see that the total cost incurred by greedy for pairs cheaper than is at most . Based on this observation, we will partition into disjoint sets where each set contains pairs of terminals for which greedy paid . These sets will be called cost classes. Moreover in order to introduce the first observation let be an open ball with center the terminal and radius . Then a classic observation is the following.
Let be a collection of balls around terminals, such that (1) all the balls are pairwise disjoint and (2) each ball is centered as some terminal and its radius satisfies . Then . The proof of this observation is straightforward, and we will come back to it later in Section 2
. These balls can be viewed as a solution to the dual of the natural linear programming relaxation of the Steiner Forest problem; hence we will refer to these balls as dual balls. We continue by restating (informally) a key lemma in the analysis of
[SODA96].[[SODA96]] For any cost class (associated with cost ), it is possible to place disjoint dual balls in such that these balls are centered around terminals that belong to and they all have a radius of
Moreover, at the cost of losing a constant factor we can assume that every pair in has at least one dual ball centered around or .
By taking Lemma 1.2 together with Observation 1.2, we obtain that greedy pays at most times the dual solution for each cost class. This proves that greedy is competitive for each cost class . Hence we see that the previous proof technique has mainly two ingredients:

Partition the set into disjoint cost classes such that and for each pair in greedy paid .

Prove that greedy is competitive for each cost class separately by building a dual solution.
By (1) and (2) we get that greedy is competitive. Interestingly, in the case of Steiner Tree, it is possible to improve the second step by showing that greedy is competitive for each cost class, hence the competitive ratio in general (see [alon1992line]). Unfortunately, this is impossible in the case of Steiner Forest. Even if the underlying optimum is a spanning tree, it might be that greedy is already competitive for a single cost class (see Appendix A.1 for an example). We also note that the BermanCoulston algorithm [berman1997line] is designed so that the algorithm is competitive for each cost class, so the analysis of this more complex algorithm cannot apply to greedy.
Our new approach.
The first step of our new proof relies in a different partitioning of the set . Indeed we will partition into classes such that each class is defined as follows.
Note that we have groups with this partitioning, each containing cost classes. Inside each group, the cost classes have the nice property that they are well separated, that is, the multiplicative gap between two consecutive costs is . We will make good use of this property to disentangle the interactions between pairs that have different costs. Using these techniques we prove that the competitive ratio of greedy for each set is ending up with a competitive ratio of overall. The main technical challenge lies in proving such a result.
If we use Lemma 1.2 to place dual balls around pairs in each cost class (hence creating several collections of balls) it might be that two dual balls that belong to different sets and overlap. This is the critical issue in the previous proofs, and we will proceed differently. For simplicity, assume we have two cost classes in our set . Let be the cost of the larger class and the cost of the smaller class (hence ). We place the dual balls only for the biggest cost class (using Lemma 1.2). Intuitively, the worst case in the analysis will be when all pairs of the smaller cost lie inside the dual balls from the bigger cost class (as depicted in Figure 1). If this happens, it will be impossible to place dual balls for the small cost class without intersecting the bigger balls already placed. To overcome this issue, we consider a ball from the big cost class, and we look at the number of pairs from the small cost class that lie inside this ball. Let the number of such pairs be . We have two cases.

which is the easy case. In this case, instead of charging the cost of the big pairs to the dual ball we can instead charge this cost to the smaller pairs inside . By Lemma 1.2, the cost that was initially charged to the ball was . Hence if we evenly distribute this cost among all the small pairs inside , each pair will get a cost of roughly.
Since the cost was transferred to smaller pairs, we can also safely delete the big dual ball , hence making this space available to place the smaller balls. Note that smaller pairs can be charged at most once in this way because the balls in the dual solution for big pairs are pairwise disjoint. This case is depicted in the top left corner of Figure 2.

. This is the most challenging case and is depicted in the bottom right corner of Figure 2. We cannot proceed as in the previous case as we cannot guarantee that small pairs do not get charged too much. Here lies the crux of our proof. First, by rescaling slightly the ball , we can assume that almost all the small pairs in are far from the border of (a pair is far from the border if one of or is at a distance much bigger than from the border of ). For simplicity we assume that all the small pairs are far from the border of . From here we construct an instance as follows. Consider the graph that is induced by vertices inside . The instance will be composed of the set containing all the pairs of small cost that are inside , and the metric will be the graph . Here the assumption that the contraction is 1 implies that both endpoints of each pair in should be inside (one endpoint well inside the ball and one outside would cost too much). It also implies that greedy behaves exactly the same for the pairs in in instance as it was behaving for these pairs in instance , that is, for each pair greedy buys exactly the same path to connect whether instance or is running. Recall that we assumed
hence we have
We know by previous results that greedy is competitive on a single cost class; hence the competitive ratio of greedy on instance will be bounded by
Now the crucial question: What is the value of ? As we defined the graph now, it is not clear. But because we assumed all the small pairs are far from the border of , we can allow ourselves to modify the metric on the border of without changing the behavior of greedy. If we consider the set of vertices that lie exactly on the border of we will add an edge of length 0 between any pair of vertices in . This does not change the behavior of greedy on instance because these extra edges are already too far from the pairs in to be used (see Figure 2, bottom right corner). The interesting fact is now that
where we denote by the cost of edges bought by inside the ball .
These observations suggest a TopDown approach where we first try to place dual balls around big pairs. Then proceed by the case distinction described above. Then we move to the next cost class but ignoring all the pairs that got into case (2). We repeat this until we reach the bottom of the cost hierarchy. We end up with dual balls that have different radii but are all pairwise disjoint (because we ignored the pairs that were in case (2) of any iteration). During this process, each pair got into case (1) at most times, hence the total additional charge is . It remains to handle all the pairs that were ignored. The idea is now that these pairs can be partitioned into disjoint instances with not too many pairs (recall that we have an upper bound on in case (2)) and such that the optimum solution is at most what pays inside the ball that created this instance. For instance in the case of two consecutive cost classes (hence ), the total cost of ignored pairs would be:
But because the pairs in are pairwise disjoint we have . Hence in total the ignored pairs cost at most to greedy. Because we have cost classes inside a set , it feels that a competitive ratio for this set is now possible. Of course we took two consecutive cost classes so that but this is intuitively the worst case in the analysis. All this is formally handled via a delicate induction that is done in Section 2.
2 Proof of Theorem 1.1
This section is devoted to the proof of Theorem 1.1. Recall that this theorem applies to the three variants of greedy as defined in the introduction. Hence in this section, will denote for any . This section is organized as follows. In Subsection 2.1, we introduce some basic definitions that will be needed. In Subsection 2.2, we detail some results implied by previous work as well as some preprocessing of the instance needed for the rest of the proof. Namely, we recall the concept of dual fitting used by [SODA96] . In addition, we preprocess the instance so that the different costs greedy pays upon the arrival of different pairs is wellstructured (i.e. there is a geometric grouping and a big gap inbetween two consecutive cost classes). In Subsection 2.3, we give an overview of the main body of the proof, and finally, in Subsections 2.4 and 2.5, we finish the proof.
2.1 Problem definition and notation
We will consider a slightly more general problem than Online Steiner Forest. Formally, we are given a weighted graph with weight function . Along with graph we are given an ordered sequence of pairs of vertices revealed one by one, and an ordered sequence of sets of additional weighted edges. These edges will be made available to the online algorithm over time as follows. Before revealing the first pair , the set of edges in is added in the graph to form the graph . These edges in will remain available to the greedy algorithm until the end. Then buys some path and contracts the metric according to its contraction rule as defined in the Introduction to obtain graph . Next, before revealing , we add the edge set into the graph to obtain the graph (hence updates the metric accordingly). Then sees the pair and so on. In general, if denotes the current metric available to greedy after reading pairs , we first add the edges of to the graph and after this connects the pair via the shortest path contracting the metric according to the chosen contraction rule. We call this variant online Steiner Forest in decreasing metrics. This generalizes the classic Online Steiner Forest which is the special case where for all .
The goal is to compare the cost incurred by on the instance to the cost of the optimum Steiner forest in the graph with pairs . We insist that the offline optimum is not allowed to use edges from while the algorithm can use these edges in after they are revealed to it. We will denote the optimum cost by . The size of an instance is the number of pairs in . It will be denoted in the following (hence ). For each pair , we will naturally call the endpoints of the pair the two vertices .
For any subset , we will denote by the cost incurred by algorithm on instance because of pairs in . By a slight abuse of notation, for a single pair , we will denote by the cost that pays upon arrival of .
The contraction of a pair with respect to instance and algorithm will be the ratio of the shortest path distance inbetween the two endpoints of the pair in and the actual cost paid by for this pair when running instance . Note that the shortest path is taken in the original graph , without help of edges in . Formally, if we denote by the contraction of pair , we have
with the convention that if . Given a fixed and an instance , we denote by the set of pairs of that have contraction less than when running (i.e. the pairs with ).
For simplicity we will assume that every edge is of weight exactly for some arbitrarily small . If the graph does not satisfy this, we subdivide all the edges into chains of smaller edges. Of course, this increases the number of edges and vertices in the graph, but since our competitive ratio is only a function of the number of pairs of terminals, this subdivision will not hurt our analysis. It is also clear that subdividing edges changes neither the optimum solution nor the behavior of the greedy algorithm. It also does not change any of the parameters we just defined above. This assumption will be used for simplicity when constructing balls in the graph; we will assume that no edge in the graph has an endpoint inside and the other endpoint outside the balls (i.e. edges do not ”jump over” the border of any ball).
2.2 Preliminary results and preprocessing of the instance
We describe here some key concepts that will be useful in the rest of the proof. We first introduce the following definition that gives much more structure to the instance .
[Canonical instance] For any , any instance of online Steiner Forest in decreasing metrics is said to be canonical with respect to a greedy algorithm if the following holds:

There exist some real number such that for any pair , there exists an integer such that
(i.e. we have some geometric grouping of costs and two consecutive cost classes are separated by a multiplicative factor of at least ). We will say that cost classes are well separated, and define .

All pairs in have contraction at most when running on instance . We say that all pairs have low contraction.

For any , the set of additional edges contains exactly one edge with the same endpoints of the pair and whose weight is exactly (i.e. we can assume connected the pair by simply using the single edge in ).
This definition suggests that we partition the set of pairs into cost classes where is the subset of pairs in that cost exactly . Note that there are at most distinct cost classes. Given this definition, we first claim the following lemma. Intuitively, the lemma states that worstcase instances can be reduced to canonical instances (for some big ) at a multiplicative loss of .
For any instance of size , any greedy algorithm and any , there exists an canonical instance of size such that
By standard geometric grouping arguments we can assume that there are at most cost classes such that the greedy algorithm pays a cost of for all pairs in . This first transformation already appeared in [PanigrahiLecture] and loses a constant factor. Then we consider these cost classes but we keep only the pairs of contraction at most . Fix We partition the pairs as follows:
for all . Since there are groups, one of them represents at least a fraction of the total cost. Keep only this group and transform the instance by adding additional edges to as follows. Assume that we kept the group then index the pairs in by order of arrival i.e. . For each pair , we define the set of additional edges as a single edge whose endpoints are exactly the endpoints of the pair and whose length is exactly what paid for this pair in the original instance . This formally describes the instance . We claim that for any pair selected, the greedy algorithm pays exactly the same cost for this pair regardless of which instance or is running. We can prove this simple fact by induction of the number of pairs already arrived in . If no pair has arrived this is clear. Now consider the next pair to arrive. Note that when running instance , a path of length exactly is available to connect since we added an edge in of exactly this length connecting the endpoints of . We claim that there cannot be a shorter path. Indeed, by induction we assumed that paid the same in instance and for previously arrived pairs hence it must be that the greedy algorithm used the additional edges in to connect previously arrived pairs. Because is for some , it must be that the shortcuts added by on instance so far are exactly edges of length 0 with endpoints at the endpoints of pairs arrived before . Note that when running greedy on instance , the endpoints of previously arrived pairs must be at distance 0 when the new pair arrives. Hence all the shortcuts available to when receiving the pair in instance are also available when receiving the pair in instance . In particular, the shortest path taken by for pair in instance can only be longer than the path taken for pair in instance .
One can see that in total we lose a multiplicative factor of at most during the reduction. Finally, it is also clear that and since the graph has not changed and we keep in only a subset of the pairs in . This ends the proof of the lemma.
The rest of the section will be devoted to the proof of the following theorem. Let be any greedy algorithm that uses one of our 3 contraction rules. Let be an canonical instance (of size ) of online Steiner Forest in decreasing metrics. Assume . Then,
Note that Theorem 2.2 together with Lemma 2.2 imply Theorem 1.1. To see this, consider any instance . By losing a multiplicative factor of and only considering the pairs in , we transform the instance into an canonical instance using Lemma 2.2. Then we apply Theorem 2.2 on instance and the total competitive ratio for pairs in will be which is exactly what we wanted to prove.
Dual fitting.
A key technical ingredient in the proof of Theorem 2.2 will be dual fitting, which was also used in [alon1992line, SODA96] and is a common technique in competitive analysis. In the case of Steiner Forest, a natural way to do dual fitting without explicitly writing a linear program is to consider a set of balls in the graph . For some vertex and some radius , the ball is the open ball of center and radius , i.e.
Denote by the set of terminals which are the vertices that appear in at least one pair. For a terminal , denote by its mate which is the terminal that should be connected to. Note that we can assume without loss of generality that all terminals have a only one mate, by duplicating vertices if this is not the case. Assume we have a collection of balls such that:

All of the balls in are pairwise disjoint, and

Each ball is centered at some terminal and its radius satisfies .
Then if we define the sum of radii of these balls it must be that
A reason for this is that any feasible solution to the Steiner Forest instance must connect to . If we look at any ball , then at its center lies at a terminal , and we know that is not in . Therefore, to connect to , a feasible solution needs to buy at least a path from the center to the border of , which will have length at least . Since all balls are pairwise disjoint, we know that these paths will be disjoint, and we can sum the lower bounds on each ball.
An alternative view of this is that the dual balls can be seen as a feasible solution (because the balls are pairwise disjoint) to the dual of the natural LP relaxation of the Steiner Forest problem. Then by weak duality, we know that any feasible solution has cost at least the cost of the dual. Hence we will also refer to a collection of balls as above as a dual solution. A dual solution is feasible if all the corresponding balls are pairwise disjoint and are all centered around the endpoints of some pairs. Using the proof technique of Awerbuch, Azar, and Bartal, we obtain the following lemma whose proof appears in Appendix A. [[SODA96]] Let be an instance of online Steiner Forest in decreasing metrics. Consider a cost class of pairs. Let be an arbitrary subset of . Let be the constant such that
for all and a greedy algorithm. Fix an arbitrary radius
Then for any such radius it is possible to construct a feasible dual solution such that:

All the balls have a radius equal to ,

,

each pair has at most one ball (denoted ) centered around one of its endpoints, and

all balls in are centered around endpoints of pairs in .
This lemma is actually the crux of the previous analysis that gives in general. We will use this lemma as a starting point for our improved analysis. We are now ready to start the overview of our main proof.
2.3 Overview of the proof
Recall that we aim to prove Theorem 2.2. By assumption we have that all pairs have contraction at most and that the cost classes are well separated. Recall that this means the multiplicative gap between two consecutive cost classes is at least for some . We use to denote the number of cost classes where class is denoted by for .
The goal will be to construct a feasible dual solution that has some special properties. This dual solution will be constructed by taking a subset of the dual balls in each of the dual solutions constructed with the technique of Awerbuch, Azar, and Bartal [SODA96]. In the end we will charge a portion of the cost that pays to the dual solution . The remaining portion of the cost pays will be handled by an inductive argument. To this end we will have a charging scheme, that will redistribute amongst the terminal pairs.
Precisely, the total cost that a pair carries will be
Hence we see the charge as an additional multiplicative factor on the cost of a given pair (initially, the charge is set to 1 for all pairs).
Note that we might sometimes decrease or increase the charge of a pair or transfer the cost, but we will always make sure that when a charge of a pair is decreasing, the charge of some other pairs are increasing accordingly so that no cost is lost. For any set of terminal pairs, we will let denote the total charged cost carried by pairs in . Formally,
In the proof the pairs in
will be classified into three types:
surviving pairs, charged pairs, and dangerous pairs. The surviving pairs will contain pairs such that there is a ball centered around one of the endpoints. Intuitively these pairs are good for us since we can charge their cost directly to the dual ball . The other pairs will be by default classified as nonsurviving. Nonsurviving pairs are further partitioned into two subsets, charged or dangerous pairs. Charged pairs are those pairs that have their charge set to 0 (i.e. ). Intuitively, they are also in an excellent situation for us since it means that we were able to transfer their cost entirely to other pairs. We do not need to count them in our total cost anymore. Finally, the dangerous pairs are those pairs have neither a charge equal to nor a dual ball in centered at one of the endpoints. These pairs will be handled carefully via an inductive argument since we cannot charge them to the dual solution nor to some other pair. To keep careful track of all these elements, we will store a triple where is the charge function as described above, is a feasible dual solution and a set of dangerous pairs. The family of dual balls will be a union of subsets of dual balls for . Each of the balls in will account for a subset of pairs in and have some radius of roughlywhere is the cost of pairs in and . This choice of radius is coming from previous work, summarized in Lemma 2.2.
Note that in our procedure, it might be that some pairs are not yet classified into one of the three categories (surviving, charged, or dangerous). However, at the beginning of iteration , all pairs in cost classes will be classified. The procedure contains two main steps.
Step 1.
In this step, we start taking into account interactions inbetween cost classes. Informally, we do an iterative procedure from to , where we try to build the dual solution from top to bottom. When we start iteration of this procedure we have a feasible dual solution composed of (of radius with specified above) centered around pairs in . All the pairs in that are not yet classified are guaranteed to be far from the dual balls already in place. We then look at pairs in that are not yet classified, and build a dual solution around these pairs using Lemma 2.2. Because unclassified pairs are far from previously placed balls in , it is guaranteed that this new dual solution will not overlap with the previous dual solution. We then proceed as follows. For any ball that we just added, we let denote the set of pairs of (note that we only consider pairs of smaller cost) such that one of its endpoints is at a distance at most
from the center of (denoted ). Here we note that for a technical reason, is only an upper bound on the real number of pairs (i.e. ). We will let denote the set of pairs of that have one endpoint at a distance of at least
from the center of . These pairs are on the border of hence the choice of notation. With a similar analogy, we will denote the interior of by . This is the complement of in , i.e.
Then for the current ball at hand, centered at an endpoint of , we look at the total charged cost of the pairs in and make a case distinction based on this value. If this cost is more than times the charged cost of the pair , then we can safely set the charge of to , delete the dual ball , and increase the charge of pairs in to account for this lost cost. Note that the charges of pairs in increase by at most a multiplicative ). This is pictured in the top left corner of Figure 2. Since the number of cost classes is at most such accumulation of charges is not a problem (note that the charge of each pair can increase at most once per cost class in this way, since we only charge pairs inside a dual ball and the dual balls in a single cost class are pairwise disjoint).
On the contrary, if the charged cost of pairs in is less than times the charged cost of the pair , we first halve the radius of to get . If the charged cost of the pairs inside is at most a constant factor times the charged cost of , we classify all the pairs in as charged and charge their cost to the pair . Note that the charge of only increases by a constant factor when doing this, and this happens at most once per pair (when we place the dual ball around ). In addition we update to be , and we classify the pair as surviving. If on the other hand, the charge of the pairs inside is larger than a specified constant factor times the charge of , we scale the radius of up until we reach a point where most of the cost in is carried by and not . Then we mark all the pairs in as dangerous and add them to the set . We update to be the ball and classify as surviving. This case is pictured in the bottom right corner of Figure 2. Note that if the ball is not deleted, then all the pair in will be classified as either charged or dangerous. In particular, we will never try to place a dual ball around these pairs in the following iterations. We do this procedure for all the balls and then move to iteration . This step is handled in Subsection 2.4.
Step 2.
After Step 1, we end with a feasible dual solution consisting of the balls placed around surviving pairs, a charge function , and a set of dangerous pairs. Additionally, we guarantee that no pair is overcharged.
The pairs that are not dangerous are easily accounted for by the dual solution . Indeed, the surviving pairs still have a dual ball around an endpoint, and the charged pairs have their cost entirely redistributed to other pairs. The only problem might come from dangerous pairs. However, because of how we constructed the dual solution and the set , we will be able to cluster the dangerous pairs into disjoint subinstances that are contained in dual balls corresponding to bigger cost classes. These instances are disjoint, and the crux of the argument is to show a statement of the form:
If the greedy algorithm were to run separately on each subinstance, then the cost greedy would pay for these pairs would be the same cost that it was paying for these pairs in the bigger instance .
Hence we can argue that the total cost incurred for dangerous pairs is at most the sum of costs paid by greedy on each subinstance separately. This helps because we only put pairs in in the case that their charged cost was bounded by times the charged cost of the pair that created the ball that contains them. As a result, we have a strong upper bound on the number of pairs in each smaller subinstance . To finish the proof, we need to bound the cost of the offline optimum for each subinstance . We note that because all the pairs in are in the interior of (i.e. far from the border of ), we can modify the metric of the graph at the border of . This will not change the behavior of greedy for the pairs in because the border is way too far from the interior of for greedy to be tempted to use the modified metric (recall that greedy always takes the shortest path). We will define a new graph , which is the graph induced by vertices in . We also say that all the vertices exactly on the border of are all at a distance from each other (see bottom right corner of Figure 2). With this modification, it becomes clear that the offline optimum cost on instance is at most the cost paid by inside , which we will denote by . Hence the offline optimum cost for each subinstance is at most what the global optimum pays locally inside the ball that created the subinstance. Since all balls in are disjoint, these areas never overlap; hence the sum of all local optima is at most the global optimum of instance . Using this observation, we handle the cost incurred by pairs in via a delicate induction hypothesis on the number of cost classes in the instance. This induction is described formally in Subsection 2.5.
2.4 Building a balanced dual solution
We formalize here Step 1 of the previous subsection. We give a formal definition of all the properties that our triple should satisfy. Note that we are also given an upper bound on the real number of pairs (this is for technical reasons for handling the induction in the next subsection). Recall that for a ball we denote by the set of pairs of such that one of its endpoint is at distance at most
from the center of . We also have similar definitions for and . Now we can state the main definition of this subsection. Intuitively, conditions (a) and (b) state that is a feasible dual solution whose dual balls have radii large enough. Condition (c) states that the total charged cost of dangerous pairs inside the ball is never much more than times the charged cost of the pair that created the ball . Similarly, condition (d) states that the charged cost of dangerous pairs on the border of is not more than times the charged cost of dangerous pairs strictly inside . The last condition (e) states that no pair was charged too many times. [Balanced dual solution] A balanced dual solution for an canonical instance with respect to algorithm is a quadruple such that:

All balls are pairwise disjoint and . Moreover, is partitioned into subcollections of balls such that,

For every , each ball in has a radius that satisfies , with the cost associated to cost class and .

For every ball

For any ball

For any , any pair ,
The main result of this subsection will be that it is always possible to find a balanced dual solution. Given an canonical instance with respect to , a balanced dual solution always exists provided that , the number of cost classes satisfies , and the number of pairs satisfies . We build the solution quadruple with an iterative procedure from to (recall that is the number of cost classes in ) that will maintain the following invariants at the beginning of any iteration :

For any , the dual balls in are already fixed and all pairs in are already classified as either surviving, charged, or dangerous. The dual balls of satisfy conditions (b), (c), (d) of Definition 2.4.

For any , the charge of pairs in satisfy condition (e) of Definition 2.4.

For any , the pairs in can be classified as either dangerous or charged, or not be classified yet. No pair of is classified as surviving yet. We also have .

For any , all pairs in classified as either dangerous or charged satisfy condition (e) in Definition 2.4. The pairs that are not yet classified satisfy