Weighted matching is one of the founding problems in combinatorial optimization, playing an important role in the settling of the area. The work by Edmonds on this problem greatly influenced the role of polyhedral theory on algorithm design . On the other hand, the problem found applications in several domains [26, 1, 22, 6, 2]. In particular routing problems are an important area of application, and its procedures often appeared as subroutines of other important algorithms, the most notable being Christofides’ algorithm for the traveling salesperson problem .
An important aspect of devising solution methods for optimization problems is studying the sensitivity of the solution towards small changes in the input. This sensitivity analysis has a long history and plays an important role in practice . Min-cost matching is a problem that has particularly sensitive optimal solutions. Assume for example that nodes lie on the real line at points and for some and all , see Fig. 1. The min-cost matching, for costs equal the distance on the line, is simply the edges . However, even under a minor modification of the input, e.g., if two new nodes appear at points and , the optimal solution changes all of its edges, and furthermore the cost decreases by a factor. Rearranging many edges in an existing solution is often undesirable and may incur large costs, for example in an application context where the matching edges imply physical connections or binding commitments between nodes. A natural question in this context is whether we can avoid such a large number of rearrangements by constructing a robust solution that is only slightly more expensive. In other words, we are interested in studying the trade-off between robustness and the cost of solutions.
We consider a two-stage robust model with recourse. Assume we are given an underlying metric space . The input for the first stage is a complete graph whose node set is a finite, even subset of . The cost of an edge is given by the corresponding cost in the metric space. In a second stage we get an extended complete graph containing all nodes in plus additional nodes. As before, costs of edges in are given by the underlying metric. In the first stage we must create a perfect matching for . In the second stage, after is revealed, we must adapt our solution by constructing a new perfect matching for , called the second stage reply. We say that a solution is two-stage -robust if for any instantiation of the second stage there exists a solution such that two conditions hold. First, the total cost of edges in must satisfy for , where denotes a min-cost perfect matching in . Second, it must hold that .
An algorithm is two-stage -robust if, given and , it returns a two-stage -robust matching and, given the set of new arrivals, a corresponding second stage reply. We refer to as the competitive factor and as the recourse factor of the algorithm. Our main goal is to balance cost and recourse, and thus we aim to obtain algorithms where and are constants.
Our model is closely related to an online model with recourse. Consider a graph whose nodes are revealed online two by two. Our objective is to maintain a perfect matching at all times. As above, irrevocable decisions do not allow for constant competitive factors. This suggests a model where in each iteration we are allowed to modify a constant number of edges. An -competitive algorithm that deletes at most edges per iteration can be easily transformed into a two-stage -robust algorithm. Thus, we can think that our two-stage model is a first step for understanding this more involved online model.
Our Results and Techniques.
We distinguish two variants of the model. In the -known case we assume that in Stage 1 we already know the number of new nodes that will arrive in Stage 2. For this case we present a simple two-stage -robust algorithm.
Let be a metric space, with even, and be the complete graph on . For known in advance, there is a perfect matching in that is two-stage -robust for arrivals. Such a matching and corresponding second stage reply can be computed in time .
The example in Fig. 1 illustrates a worst case scenario for the strategy of choosing as the first stage matching for . The reason for this is that the nodes arriving in Stage 2 induce a path in that incurs a significant drop in the optimum value. Our algorithm is designed towards preparing for such bad scenarios. To this end, we define the notion of gain for a path with respect to a matching as follows:
In Stage 1, our algorithm chooses disjoint -alternating paths of maximum total gain with respect to . For each such path we modify by removing and adding , where is the edge that connects the endpoints of . Our choice of paths of maximum gain implies that is larger than . Therefore we can bound the cost of the solution in the first stage against that of and also infer that most of its costs is concentrated on the edges . For the second stage we construct a solution for the new instance by removing the edges of the form and adding new edges on top of the remaining solution. The algorithm is described in detail in Section 2.
For the case where is unknown the situation is considerably more involved as a first stage solution must work for any number of arriving nodes simultaneously. In this setting we restrict our study to the real line and give an algorithm that is two-stage -robust.
Let and , with even, and let be the complete graph on . Then there is a perfect matching in that is two-stage -robust. Such a matching, as well as the second stage reply, can be computed in time .
The first stage solution is constructed iteratively, starting from the optimal solution. We will choose a path greedily such that it maximizes among all alternating paths that are heavy, i.e., the cost of is a factor 2 more expensive than the cost of . Then is modified by augmenting along and adding edge , which we fix to be in the final solution. We iterate until only consists of fixed edges. As we are on the line, each path corresponds to an interval and we can show that the constructed solution form a laminar family. Furthermore, our choice of heavy paths implies that their lengths satisfy an exponential decay property. This allows us to bound cost of the first stage solution. For the second stage, we observe that the symmetric difference induces a set of intervals on the line. For each such an interval, we remove on average at most two edges from the first stage matching and repair the solution with an optimal matching for the exposed vertices. A careful choice of the removed edges, together with the greedy construction of the first stage solution, give us constant factor guarantees for the total cost of -edges inside these intervals. We can use this to argue that the cost of the resulting second stage solution is within a constant factor of the optimum. See Sections 4 and 3 for a detailed description of this case.
Intense research has been done on several variants of the online bipartite matching problem [17, 16, 18, 4, 21]. In this setting we are given a known set of servers while a set of clients arrive online. In the online bipartite metric matching problem servers and clients correspond to points from a metric space. Upon arrival, each client must be matched to a server irrevocably, at cost equal to their distance. For general metric spaces, there is a tight bound of on the competitiveness factor of deterministic online algorithms, where is the number of servers [18, 16]. Recently, Raghvendra presented a deterministic algorithm  with the same competitiveness factor, that in addition is -competitive in the random arrival model. Also, its analysis can be parameterized for any metric space depending on the length of a TSP tour and its diameter . For the special case of the metric on the line, Raghvendra  recently refined the analysis of the competitive ratio to . This gives a deterministic algorithm that matches the previously best known bound by Gupta and Lewi , which was attained by a randomized algorithm. As the lower bound of  could not be improved for 20 years, the question whether there exists a constant competitive algorithm for the line remains open.
The online matching with recourse problem considers an unweighted bipartite graph. Upon arrival, a client has to be matched to a server and can be reallocated later. The task is to minimize the number of reallocations under the condition that a maximum matching is always maintained. The problem was introduced by Grove, Kao and Krishnan . Chaudhuri et al.  showed that for the random arrival model a simple greedy algorithm uses
reallocations with high probability and proved that this analysis is tight. Recently, Bernstein, Holm and Rotenberg showed that the greedy algorithm needs allocations in the adversarial model, leaving a small gap to the lower bound of . Gupta, Kumar and Stein  consider a related problem where servers can be matched to more than one client, aiming to minimize the maximum number of clients that are assigned to a server. They achieve a constant competitive factor server while doing in total reassignments.
Online min-cost problems with reassignments have been studied in other contexts. For example in the online Steiner tree problem with recourse a set of points on a metric space arrive online. We must maintain Steiner trees of low cost by performing at most a constant (amortized) number of edge changes per iteration. While the pure online setting with no reassignment only allows for competitive factors, just one edge deletion per iteration is enough to obtain a constant competitive algorithm ; see also [13, 20].
The concept of recoverable robustness is also related to our setting . In this context the perfect matching problem on unweighted graphs was considered by Dourado et. al. . They seek to find perfect matchings which, after the failure of some edges, can be recovered to a perfect matching by making only a small number of modifications. They establish computational hardness results for the question whether a given graph admits a robust recoverable perfect matching.
2 Known Number of Arrivals
In this section, we consider the setting where is already known in Stage 1. Let be the graph given in Stage 1 (with edge costs induced by an arbitrary metric) and let be a min-cost perfect matching in . Without loss of generality assume that , as otherwise, we can remove all edges of in Stage 2.
Algorithm 1.1 works as follows: (i) Let be disjoint, -alternating paths maximizing . (ii) Set . (iii) Return
It is easy to see that each path starts and ends with an edge from and . As a consequence, is a perfect matching and
Using and we obtain
Now consider the arrival of new vertices, resulting in the graph with min-cost matching . Note that is a -join, where is the set of endpoints of the paths and the newly arrived vertices.
Algorithm 1.2 works as follows: (i) Let be the maximal paths from . (ii) Return .
Note that consists of alternating paths , from which we remove the starting and ending -edge. Then these paths would have been a feasible choice for , implying that the total gain of the ’s is at most that of the ’s. We conclude that
Applying , we obtain
As , we conclude that is indeed two-stage -robust. We remark that the matchings described in the section can be computed efficiently by solving a minimum weight -join problem in an extension of . This concludes our proof of Theorem 1.1.
3 Unknown Number of Arrivals – Stage 1
In this section, we consider the case that the underlying metric corresponds to the real line. This implies that there is a Hamiltonian path in such that for all , where is the subpath of between nodes and . We will refer to as the line and call the subpaths of intervals. The restriction to the metric on the line results in a uniquely defined min-cost perfect matching with a special structure.
Lemma 1 ()
is the unique perfect matching contained in .
When the number of arrivals is not known in the first stage, the approach for constructing the first stage matching introduced in Section 2 does not suffice anymore. Fig. 2 illustrates a class of instances for which Algorithm 1.1 cannot achieve -robustness, no matter how we choose . For a matching , define . The example in Fig. 2 can be generalized to show that we cannot restrict ourselves to constructing matchings with the property that is bounded by a constant.
In view of the above example, we adopt the approach from Section 2 as follows. Instead of creating a fixed number of paths, our algorithm now iteratively and greedily selects a path of maximum gain with respect to a dynamically changing matching (initially ). In order to bound the total cost incurred by adding edges of the form , we only consider paths for which contributes a significant part to the total cost of .
We say that is -heavy if
We say that is -light if
Algorithm 2.1 works as follows:
(i) Initialization: Set and .
(ii) While : Let be an -heavy -alternating path maximizing and update and .
(iii) Return .
Note that in each iteration, the path starts and ends with an edge from as it is gain-maximizing (if ended with an edges that is not in , we could simply remove that edge and obtain a path of higher gain). Therefore it is easy to see that is always a perfect matching, and in each iteration the cardinality of decreases by .
Now number the iterations of the while loop in Algorithm 2.1 from to . Let be the state of at the beginning of iteration . Let be the path chosen in iteration and let be the corresponding edge added to . The central result in this section is that the paths form a laminar family of intervals on the line. See Fig. 3 for an illustration of the proof idea and Section 0.A.2 for the complete proof.
Lemma 2 ()
for all .
For all with , either or .
Lemma 2 induces a tree structure on the paths selected by Algorithm 2.1. We define the directed tree as follows. We let and define . For we add the arc to if and there is no with . It is easy to see that is an out-tree with root . We let be the unique --path in . We define the set of children of by . Furthermore, let and be the set of heavy and light nodes in the tree, respectively. These names are justified by the following lemma. See Fig. 4 a)-b) for an illustration.
Lemma 3 ()
If , then and, in particular, is -heavy. If , then and, in particular, is -light.
Let . From Lemma 2 we know that for every iteration with it holds that either or . In the first case it holds that , in the latter case it holds that . Moreover, it is easy to see that and holds if and only if . If , this implies that there exist an even number of iterations for which holds. Hence, we obtain
If , this implies that there exist an odd number of iterations for which holds. Hence, we can deduce that
The fact that nested paths are alternatingly -heavy and -light implies an exponential decay property. As a consequence we can bound the cost of .
Lemma 4 ()
Let . Then .
Let . Then
where the first inequality follows from the fact that is -heavy; the second inequality follows from the fact that for and the fact that the intervals for all children are disjoint; the last inequality follows from the fact that is -heavy. ∎
Lemma 5 ()
Note that . For , let
Observe that Lemma 4 implies that for all . Furthermore , because . Hence
4 Unknown Number of Arrivals – Stage 2
We now discuss how to react to the arrival of additional vertices. We let be the min-cost perfect matching in the resulting graph and define
We call the elements of requests.
An important consequence of our restriction to the metric space on the line is that (in fact, each of the maximal paths of is contained in after removing its first and last edge).
Lemma 6 ()
and each starts and ends with an edge of .
For simplification of the analysis we make the following assumptions. In Section 0.B.9 we show that they are without loss of generality.
For all and all , either or , or .
For all , if , then the first and last edge of are in .
From the set of requests, we will determine a subset of at most edges that we delete from . To this end, we assign each request to a light node in as follows. For we define , i.e., is the inclusionwise minimal interval of a light node containing . For , let
Furthermore, we also keep track of the gaps between the requests in as follows. For , let
For notational convenience we also define . Note that for all . However, may contain a request from descendants of . See Fig. 4 c) for an illustration of the assignment.
For , let and . Note that and . Before we can state the algorithm for computing the second stage reply, we need one final lemma.
Lemma 7 ()
Let . For every , there is a with . For every , there is an with .
Algorithm 2.2 works as follows:
(i) Create the matching by removing the following edges from for each :
The edge if and .
For each the edge where .
For each the edge where .
(ii) Let be a min-cost matching on all vertices not covered by in . Return .
Let be indices of the edges removed in step (i). It is not hard to see that for each and therefore , bounding the recourse of Algorithm 2.2 as intended.
Lemma 8 ()
Now let the nodes corresponding to edges that have not been removed and
the nodes that correspond to maximal intervals that have not been removed.
The following lemma is a consequence of the exponential decay property. It shows that in order to establish a bound on the cost of , it is enough to bound the cost of all paths for .
Lemma 9 ()
It remains to bound the cost of the paths associated with the tree nodes in . We establish a charging scheme by partitioning the line into three areas :
For , let . We define .
For and , let .
We define .
We define .
Consider a set for some . Recall that is the index of the smallest light interval constructed by Algorithm 2.1 containing and that is the first child interval of created by Algorithm 2.1 that intersects . From the choice of and the greedy construction of as a path of maximum -gain we can conclude that is not -heavy; see Fig. 5 for an illustration. Therefore . Note that , because . Hence we obtain the following lemma.
Lemma 10 ()
Let . Then .
A similar argument implies the same bound for all sets of the type for some and .
Lemma 11 ()
Let . Then .
Furthermore, one can show that the sets of the form and and the set form a partition of . We define by
Lemma 12 ()
We are now able to bound the cost of each path for against its local budget .
Lemma 13 ()
Let . Then .
For intuition, we give a short sketch for the proof of Lemma 13. Note that because is -heavy. Hence it suffices to show that a significant portion of is contained in the support of . Consider an edge . If then there must be a request with . Note that , since was not removed Algorithm 2.2. Thus, or is a descendant or ancestor of . The complete proof, given in Section 0.B.7, establishes that cannot be an ancestor of (under Assumption A) and bounds the total cost of -edges contained in child intervals of .
Lemma 14 ()
Let . Then .
Appendix 0.A Omitted proofs from Section 3
For convenience, we define the projection that maps an edge to the corresponding subpath . We first establish some use pful helper lemmas.
Let and let be two -heavy (-light, respectively) sets with . Then is -heavy (-light, respectively).
Let and be -heavy sets with . It follows that