# Exact and Approximation Algorithms for Many-To-Many Point Matching in the Plane

Given two sets S and T of points in the plane, of total size n, a many-to-many matching between S and T is a set of pairs (p,q) such that p∈ S, q∈ T and for each r∈ S∪ T, r appears in at least one such pair. The cost of a pair (p,q) is the (Euclidean) distance between p and q. In the minimum-cost many-to-many matching problem, the goal is to compute a many-to-many matching such that the sum of the costs of the pairs is minimized. This problem is a restricted version of minimum-weight edge cover in a bipartite graph, and hence can be solved in O(n^3) time. In a more restricted setting where all the points are on a line, the problem can be solved in O(nlog n) time [Colannino, Damian, Hurtado, Langerman, Meijer, Ramaswami, Souvaine, Toussaint; Graphs Comb., 2007]. However, no progress has been made in the general planar case in improving the cubic time bound. In this paper, we obtain an O(n^2· poly(log n)) time exact algorithm and an O( n^3/2· poly(log n)) time (1+ϵ)-approximation in the planar case. Our results affirmatively address an open problem posed in [Colannino et al., Graphs Comb., 2007].

## Authors

• 17 publications
• 25 publications
• 19 publications
09/13/2019

### Linear Size Planar Manhattan Network for Convex Point Sets

Let G = (V, E) be an edge-weighted geometric graph such that every edge ...
07/15/2020

### An Õ(n^5/4) Time ε-Approximation Algorithm for RMS Matching in a Plane

The 2-Wasserstein distance (or RMS distance) is a useful measure of simi...
03/22/2019

### Efficient Algorithms for Geometric Partial Matching

Let A and B be two point sets in the plane of sizes r and n respectively...
10/24/2018

### Approximate Minimum-Weight Matching with Outliers under Translation

Our goal is to compare two planar point sets by finding subsets of a giv...
11/25/2016

### A Distance Function for Comparing Straight-Edge Geometric Figures

This paper defines a distance function that measures the dissimilarity b...
11/24/2019

### On Maximum-Sum Matchings of Points

Huemer et al. (Discrete Mathematics, 2019) proved that for any two point...
12/31/2020

### Faster Distance-Based Representative Skyline and k-Center Along Pareto Front in the Plane

We consider the problem of computing the distance-based representative s...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let be a simple bipartite graph where each edge has a non-negative real weight, and no vertex is isolated. The many-to-many matching problem on is to find a subset of edges of minimum total weight such that for each vertex there is an edge in incident on . This is often referred to as the minimum-weight edge cover problem. A standard method to compute a minimum-weight edge cover of in polynomial-time is to reduce the problem to the minimum-weight perfect matching problem on an equivalent graph, see [16, 3, 4, 5]. Since the reduction takes time, the running time is the same as the fastest known algorithm for computing a minimum-weight perfect matching. A faster -approximation algorithm is proposed in [5].

Motivated by the computational problems in musical rhythm theory, Colannino et al. [3] studied the many-to-many matching problem in a geometric setting. Suppose we are given two sets and of points in the plane, of total size . A many-to-many matching between and is a set of pairs such that , and for each , appears in at least one such pair. The cost of a pair is the (Euclidean) distance between and . The cost of a set of pairs is the sum of the costs of the pairs in the set. In the geometric minimum-cost many-to-many matching problem, the goal is to compute a many-to-many matching such that the cost of the corresponding set of pairs is minimized. For points on a line, an time dynamic-programming algorithm is proposed in [3]. In the same paper, the importance of the many-to-many matching problem for points in the plane is stated in the context of melody matching. Furthermore, the authors state the task of generalizing their work to this version in the plane as an important open problem, especially since geometry has helped in designing efficient algorithms for the computation of the minimum-weight perfect matching for points in the plane (see, e.g., [17, 18]). Several variants of the -dimensional many-to-many matching problem are considered in [13, 14, 15].

### 1.1 Our Results and Techniques

In this work, we design several exact and approximation algorithms for minimum-cost many-to-many matching in the plane, thus affirmatively addressing the open problem posed in [3]. First, we obtain an time222We use the notation to denote a polynomial function exact algorithm for this problem using a connection to minimum-weight perfect matching. We note that our time-bound matches the time bound for solving minimum-weight bipartite matching for points in the plane. Next, for any , we obtain a -approximation algorithm for this problem with improved running time for some small constant . We also obtain a simple -approximation in time.

Next, we give an overview of our techniques. A major reason behind the scarcity of results for geometric many-to-many-matching is the lack of techniques to directly approach this problem. The algorithm known for general graphs reduces the problem to (minimum-weight) bipartite perfect matching and uses a graph matching algorithm to solve the problem on the new instance. However, this standard reduction [9] from many-to-many matching to regular matching changes the weights of the edges in a convoluted manner. Hence, even if one starts with the planar Euclidean distances, the new interpoint distances in the constructed instance cannot be embedded in the plane or even in any metric space. As the algorithms for planar bipartite (perfect) matching heavily exploit the properties of the plane, they cannot be employed for solving the new instance of bipartite perfect matching. Thus, even though there is a wealth of literature for planar bipartite matching, no progress has been made in understanding the structure of many-to-many matching in the plane.

In our approach, we use a rather unconventional connection to bipartite perfect matching. First, we use a different reduction to convert our instance of many-to-many matching in the plane to an equivalent instance of bipartite perfect matching. The new instance cannot be embedded in the plane, however it does not modify the original distances between the points. Rather the new bipartite graph constructed is the union of (i) the original geometric bipartite graph, (ii) a new bipartite clique all of whose edges have the same weight and (iii) a linear number of additional edges. Thus, even though we could not use a planar matching algorithm directly, we could successfully use ideas from the literature of planar matching exploiting the structure of the constructed graph. A similar reduction was used in [4] in the context of computation of link distance of graphs. However, we cannot use this reduction directly, as we cannot afford to explicitly store the constructed graph which contains edges. Recall that we are aiming for an time bound. Nevertheless, we exploit the structure of this graph to implicitly store it in space.

In Section 3, we show that the Hungarian algorithm can be implemented on our constructed graph in time using ideas from planar matching. To obtain a subquadratic time bound, we implement the algorithm due to Gabow and Tarjan [6]. A straightforward implementation of this algorithm might need time. We note that this algorithm has been used in several works on planar matching for obtaining efficient algorithms [12, 18, 2]. Our algorithm is closest to the one in [18] among these algorithms from a moral point of view. The difference is that their algorithm is for planar points. However, we deal with a graph which is partly embeddable. Nevertheless, in Section 5, we show that using additional ideas and data structures, the Gabow-Tarjan algorithm can be implemented in time, albeit with a -factor loss on the quality of the solution. In Section 4, we describe our 2-approximation which the -approximation algorithm uses as a subroutine for preprocessing.

## 2 Preliminaries

In the bipartite perfect matching problem, we are given an edge-weighted bipartite graph containing a perfect matching, and the goal is to find a perfect matching having the minimum cost or sum of the edge-weights. For our convenience, sometimes we would assume that the edges of are not given explicitly. For example, and might be two sets of points in the plane, and is the complete bipartite graph induced by the bipartition . In this case, the points in can be used to implicitly represent the graph . In our case, we will use similar implicit representation of input graphs for designing subquadratic algorithm. Now, we have the following lemma, which reduces our problem to an equivalent instance of bipartite perfect matching.

Given an instance of minimum-cost many-to-many matching, one can compute in time an instance of bipartite perfect matching such that (i) if there is a many-to-many matching for of cost , there is a perfect matching for of cost at most , and (ii) if there is a perfect matching for of cost , there is a many-to-many matching for of cost .

###### Proof.

Given the instance consisting of the two sets of points and , we construct a bipartite graph , where , , contains copies of the points in , contains copies of the points in . contains all the edges of and , and also the ones in and . The weight of each edge is the distance between the points and . The weight of each edge in is 0. The weight of any edge is the distance between and its closest neighbor in . The weight of any edge is the distance between and its closest neighbor in . See Figure 1 for an example. Note that can be represented implicitly in space, where . Let be the constructed instance of bipartite perfect matching that consists of . As the closest neighbors of points in the plane can be found in total time using Voronoi diagram (see Section 4), construction of takes time.

Now, suppose has a many-to-many matching . Consider the pairs in as the edges of a graph with vertices being the points in . We will use the term pairs and edges in interchangeably. First, note that wlog we can assume that does not contain any path of length 3. Otherwise, we can remove the middle edge of such a path from , and still remains a many-to-many matching. Thus, each component in is a star. We compute a perfect matching in from as follows. For each star in having only one edge with , add to . Also, add the edge to . Now, consider any star in ; wlog assume that . Add to . Also, add the edge to . For each , add the edge to . It is not hard to verify that all the vertices of are matched in . Also, as the weight of a star edge above is at least the weight of in by definition, the cost of is at most the cost of .

Next, suppose has a perfect matching ; we construct a many-to-many matching for . Consider any . If , add the pair to , where is the closest neighbor of in . Otherwise, is matched (in ) to some . In this case, simply add the pair to . Similarly, consider any . If , add the pair to , where is the closest neighbor of in . Otherwise, is matched (in ) to some . In this case, simply add the pair to . It is not hard to verify that is a many-to-many matching, and the cost of is same as the cost of . ∎

Now, consider any matching in a graph. An alternating path is a path whose edges alternate between matched and unmatched edges. Similarly, one can define alternating cycles and trees. A vertex is called free if it is not matched. An augmenting path is an alternating path which starts and ends at free vertices. Given an augmenting path w.r.t. a matching , we can augment by one edge if we remove the edges of from and add the edges in to . The new matching is denoted by . Throughout the paper, and denote the number of edges and vertices, respectively, unless otherwise specified. We denote the weight or cost of an edge by . In our discussions, a path can be treated as an ordered set of vertices or edges depending on the context.

## 3 An Exact Algorithm

Consider the instance obtained by the reduction in Lemma 2. In this section, we prove the following theorem.

Bipartite perfect matching can be solved exactly on in time , and hence there is an time exact algorithm for minimum-cost many-to-many matching.

In the rest of this section, we prove Theorem 3. To prove this theorem we use the Hungarian algorithm [11] for computing a (minimum-cost) bipartite perfect matching.

In the Hungarian algorithm, there is a dual variable corresponding to each vertex . Every feasible matching must satisfy the following two conditions.

 y(u)+y(v) ≤c(u,v) for every edge (u,v) (1) y(u)+y(v) =c(u,v) for every edge (u,v)∈M (2)

Given any matching and an augmenting path , the net-cost or augmentation cost is defined as,

 ϕ(P)=∑(u,v)∈P∖Mc(u,v)−∑(u,v)∈P∩Mc(u,v)

is basically the cost increment for augmenting along . The net-cost of any alternating cycle can be defined in the same way. The Hungarian algorithm starts with an empty matching . In every iteration, it computes an augmenting path of the minimum net-cost and augments the current matching. The algorithm halts once a perfect matching is found. If there is a perfect matching in the graph, this algorithm returns one after at most iterations. Moreover, an augmenting path can be found in time leading to running time in total.

It is possible to show that any perfect matching is a minimum-cost matching if and only if there is no negative net-cost alternating cycle with respect to it. Moreover, a feasible matching with dual values satisfies this property. Thus, it is sufficient to find a perfect matching that is feasible.

For finding an augmenting path, a Hungarian search procedure is employed. Hungarian search uses Dijkstra’s shortest path algorithm to find a minimum net-cost path in the graph where the value is subtracted from the weight of each edge . This along with the first feasibility condition ensure that each edge has a non-negative weight, and hence there is no negative cycle in the graph. So, one can correctly employ Dijkstra’s algorithm to find such a shortest path. Finally, one can show that the minimum net-cost augmenting path with original weights corresponds to a shortest path with the modified weights, and vice versa. After augmenting the current matching with the newly found path, the dual values are adjusted appropriately to ensure feasibility of the new matching.

Now, we describe in detail how the Hungarian search procedure is implemented in each iteration. An edge is called admissible if . It can be shown that it suffices to find an augmenting path consisting of only admissible edges. Let be the current matching and be the free vertices of . To obtain the desired augmenting path, a forest is grown whose roots are in . Each tree in is an augmenting tree rooted at a vertex in . Once

contains an augmenting path the search is completed. At any moment, let

be the vertices of in and be the vertices of not in . Initially, and . Also, let

 δ=minu∈R′,v∈B′{c(u,v)−y(u)−y(v)}.

Note that means there is an admissible edge. In each step, if , an admissible edge is selected where and . If is free, is added to and the desired augmenting path is found. Otherwise, let be matched to (which is not in by an invariant). In this case, the edges and are added to ; is added to and is removed from .

If at some moment, becomes more than 0 (no admissible edge), we perform dual adjustments. In particular, for each , is updated to and for each , is updated to . This ensures that at least one edge becomes admissible, e.g., the edge corresponding to which the value is achieved. Thus, eventually the search halts with an augmenting path in .

It can be shown that if can be computed efficiently, then the desired augmenting path can also be found efficiently [6, 1, 17]. For that purpose, another variable is maintained. Also, for each vertex , a weight is stored. In the beginning of each step, , . When the edges and are added to , is updated to and is updated to . Once becomes more than 0, is updated to .

Note that the weight of a vertex is updated only once when it is added to . We have the following observation.

[6, 1, 17] For each , the current dual value of , , is equal to . For each , the current dual value of , , is equal to .

It follows from the above observation that can be equivalently expressed as follows.

 δ=minu∈R′,v∈B′{c(u,v)−σu−σv}−Δ.

Hence, Hungarian search boils down to the following task ignoring the trivial details. We need to maintain two sets and . Initially, . In each step, a vertex is added to and a vertex is removed from . Additionally, each vertex has a weight . In every step, the goal is to maintain the bichromatic closest pair, which is the pair with the minimum value. In the following, we construct a data structure that can be used to perform the above task (a Hungarian search) in time. As we need at most such searches, Theorem 3 follows.

Recall that in our instance , we are given the graph , where contains copies of the vertices in , contains copies of the vertices in . , where , , , and . Let .

Our data structure is a collection of three data structures. First, we construct the dynamic bichromatic closest pair data structure from [8] for the two point sets and with the distance function for each pair . We refer to this data structure as . Initially, it contains only the points in . Next, we construct two max-heaps and for the vertices in and , respectively. Initially, contains vertices in and contains all the vertices in . The key value of each vertex is its weight . Using and , the pair with the maximum value can be found in time. We also construct another min-heap to store the edges in with key value for each pair . Initially, it is empty. In every iteration, it contains only those edges such that and . We also maintain a global closest pair over the three data structures , and with the minimum key value .

Using , we implement each step as follows. If , remove from . Also remove from if it is in . If , remove from . Also remove the edge with from if its in . If , add to . Also add the edge to if . If , add to . Also add the edge with to if .

It is not hard to verify that at each step the correct global closest pair is stored in . The correctness for and follow trivially. Also, as the weight of the edges in are same, it is sufficient to find a pair for which value is maximized. As contains all the edges in , equivalently it suffices to find a with the maximum value and a with the maximum value.

We note that the total number of steps in a Hungarian search is at most . Also, operations (insertions, deletions and searching) on can be performed in amortized time [8]. The construction time and space of are also . Moreover, the operations on all max-heaps can be performed in total time. It follows that Hungarian search can be implemented in time leading to the desired running time of our algorithm.

## 4 A 2-approximation in O(nlogn) Time

In this section, we prove the following theorem.

There is an time 2-approximation algorithm for minimum-cost many-to-many matching.

Our algorithm is as follows. For each point , we add the pair to the solution matching which minimizes over all . Similarly, for each point , we add the pair to which minimizes over all . Thus, for each point in a set, we basically add the pair corresponding to its nearest neighbor in the other set. By definition, the computed solution is a many-to-many matching. Next, we argue how to implement this algorithm in time. First, we compute the Voronoi diagram of the points in , compute a triangulation of the Voronoi cells and then construct Kirkpatrick’s planar point location data structure [10] using the triangulation. Using this data structure, for any , its nearest neighbor in can be computed in time. Also, the data structure uses space and construction time including the preprocessing time. Using a similar data structure for points in , we can compute the nearest neighbors of the points in in time. The following lemma completes the proof of Theorem 4.

The cost of the matching computed as in above is at most twice the optimal cost.

###### Proof.

For each point in (resp. in ), let be its nearest neighbor in (resp. in ). Note that the cost of , cost . Now, consider any optimal matching . For each , let be the cost of any pair in , i.e., a pair containing . Then, must be at least the distance between and its nearest neighbor in the other set, i.e., . It follows that, the cost of ,

 ∑p∈S∪Td(p,η(p))≤∑p∈S∪Tc∗(p)=∑p1∈Sc∗(p1)+∑p2∈Tc∗(p2)≤cost(M∗)+cost(M∗)=2⋅cost(M∗).

## 5 An Improved (1+ϵ)-approximation

Consider the instance obtained by the reduction in Lemma 2. In this section, we prove the following theorem.

A -approximate bipartite perfect matching for can be computed in time for some constant , and hence there is an time -approximation algorithm for minimum-cost many-to-many matching.

In the rest of this section, we prove Theorem 5. In particular, we show that the bipartite perfect matching algorithm by Gabow and Tarjan [6] can be implemented on the instance in the mentioned time.

The Gabow-Tarjan algorithm is based on a popular scheme called the bit-scaling paradigm. The algorithm is motivated by two classic matching algorithms: Hopcroft-Karp [7] for maximum cardinality bipartite matching with running time and Hungarian algorithm [11] for minimum-cost bipartite matching with running time. The Hopcroft-Karp algorithm chooses an augmenting path of the shortest length. The Hungarian algorithm, as mentioned before, chooses an augmenting path whose augmentation cost is the minimum. When the weights on the edges are small an augmenting path of the shortest length approximates the latter path. The Gabow-Tarjan algorithm scales the weights in a manner so that all the effective weights are small. This helps to combine the ideas of the two algorithms, which leads towards an time algorithm for (minimum-cost) bipartite perfect matching, where is the largest edge weight.

Next, we describe the Gabow-Tarjan algorithm. This algorithm is based on the ideas of the Hungarian algorithm. However, here instead of a feasible matching we compute a 1-feasible matching. A matching is called 1-feasible if it satisfies the following two conditions.

 y(u)+y(v) ≤c(u,v)+1 for every edge (u,v) (3) y(u)+y(v) =c(u,v) for every edge (u,v)∈M (4)

Note that the only difference is that now the sum of dual variables can be 1 plus the cost of the edge. This additive error of 1 on every unmatched edge ensures that longer augmenting paths have larger cost. As an effect, the algorithm picks short augmenting paths as in the Hopcroft-Karp algorithm.

A 1-optimal matching is a perfect 1-feasible matching. Note that a 1-optimal matching costs more than the original optimal matching. However, as the error is at most +1 for every edge, one can show the following.

[6] Let be a 1-optimal matching and be any perfect matching. Then .

It follows from the above lemma that, to annihilate the error introduced, one can scale the weight of each edge by a factor of , and then with the scaled weights the cost of a 1-optimal matching is same as the cost of any optimal matching. Let be the scaled cost of , i.e., . Let be the maximum number of bits needed to represent any new weight.

The Gabow-Tarjan algorithm runs in different scales. In each scale (), the most significant bits of are used for defining the current cost of each edge . The dual values are also modified to maintain 1-feasibility of an already computed perfect matching in the following way: for every vertex . Then with the current edge costs and dual values, the algorithm computes a 1-optimal matching.

By the above claim, that any 1-optimal matching is also optimal with scaling factor , the 1-optimal matching computed by the algorithm at -th scale must be optimal.

To find a 1-optimal matching on a particular scale a procedure called match is employed which we describe below. Before that we need a definition. Consider any 1-feasible matching . An edge is called eligible if it is in or . It can be shown that for the purpose of computing a 1-optimal matching it suffices to consider the augmenting paths which consist of eligible edges only.

The match procedure Initialize all the dual variables to 0 and to . Repeat the following two steps until a perfect matching is obtained in step 1. Find a maximal set of augmenting paths of eligible edges. For each path , augment the current matching along to obtain a new matching which is also denoted by . For each vertex , decrease by 1. (This is to ensure that the new matching also is 1-feasible.) If is perfect, terminate. Employ a Hungarian search to adjust the values of the dual variables (by keeping 1-feasible), and find an augmenting path of eligible edges.

Note that the number of free vertices in every iteration of match is at least 1 less than that in the previous iteration, as step 2 always ends with finding an augmenting path. By also showing that the dual value of a variable is increased by at least 1 in every call of Hungarian search, they proved that iterations are sufficient to obtain a 1-optimal matching. It can be shown that each iteration of match can be executed in general bipartite graphs in time leading to the complexity of in each scale. As we have scales, in total the running time is .

Next, we use the Gabow-Tarjan algorithm to compute a perfect matching for our instance of bipartite perfect matching. In particular, we show that by exploiting the structure of our instance, it is possible to implement every iteration of match in time. First, we show that one can consider a modified instance with bounded aspect ratio of the weights, for the purpose of computing a -approximation.

Let OPT be the optimal cost of perfect matching on the instance . Recall that the instance consists of the graph . Given the implicit representation of , we compute a 2-approximate solution for minimum-cost many-to-many matching on the two sets of points and using the algorithm in Theorem 4. Let be the cost of this solution. By Lemma 2, . We construct a new instance which consists of the implicit representation of the same graph . Additionally, we assume that any edge with weight more than in will not be part of the solution, and each edge has cost at least . Note that it is not possible to explicitly set the weight of each and every edge if we are allowed to spend time. Thus, for the time being, we make the above assumptions implicitly. Later, we will make them explicit in our algorithm. Note that the construction time of is dominated by the time of the 2-approximation algorithm, which is . We obtain the following lemma.

[] Given , one can compute in time another instance of bipartite perfect matching, such that (i) the weight of every edge is in where , (ii) has a perfect matching of weight at most , and (iii) any perfect matching in of weight is also a perfect matching in of weight at most .

###### Proof.

(i) follows by the above construction of . (ii) follows from the facts that OPT does not contain any edge of weight more than and the weight of an edge in OPT is increased by at most in . (iii) follows, as each edge in is also present in with possibly equal or lesser cost. ∎

Henceforth, we solve the problem on . Note that the minimum edge weight in is and the maximum is . By scaling the weights by , we can assume wlog that the edge weights in are in . Moreover, we can assume that each edge weight is rounded up to the nearest integer at least . We can afford to remove the fractions, as each fraction costs less than 1, which is at most w.r.t. the original weights. For our convenience, we also divide the weights into classes as follows. For each weight (an integer) with , is rounded to the largest integer in the range , where . We denote this largest integer corresponding to the -th weight class by . We note that the above weight scalings are performed implicitly. It is not hard to verify that these still preserve a -approximate solution, which is sufficient for our purpose. Henceforth, we treat as a constant and hide function of in time complexity as a constant in notation. It will not be hard to verify that the dependency on that we hide is for some true small constant .

To implement the Gabow-Tarjan algorithm, we show how the match procedure can be implemented efficiently. To implement step 1 of match, we store the information about the input graph in a data structure that we refer to as MATCH. We allow the following operation on MATCH. In the following, we denote the current matching by .

#### Find_Maximal_Aps.

Find a maximal set of vertex disjoint augmenting paths of eligible edges with respect to .

Given the MATCH data structure, we implement the step 1 of match as follows. The dual values are stored in an array indexed by the vertices. Note that the dual values remain fixed in step 1 while the augmenting paths are found. Afterwards, the dual values are updated. We first make a call to FINDMAXIMALAPS to obtain a maximal set of paths. For each , we augment along to obtain a new matching . Also, for each , we decrease by 1.

In Section 6, we show how to construct and maintain MATCH so that the above subroutine can be performed in time . The building time of MATCH is , and it takes space. Thus, by noting that contains disjoint paths, step 1 can be implemented in time and space.

We also need another data structure which will help us implement step 2 of match, which we refer to as the Hungarian search data structure. As described before, Hungarian search boils down to maintaining a bichromatic closest pair of two sets with the minimum value. However, here we have to be more careful, as for an unmatched eligible edge , . In contrast, in the Hungarian algorithm, we had in that case. Hence, we have to consider the value as the distance of such an unmatched pair . This apparently makes our life harder, as now we have to deal with two types of distance functions: one for matched pairs and one for unmatched pairs. However, we use the following observation to consider only one type of distances, which follows from our description of Hungarian search in Section 3.

Consider any matched edge with and . In the Hungarian search, if , i.e., is already in the forest, then must also be in the forest, i.e., .

Recall that for computing , we look into the pairs where and . By the above observation, it suffices to probe only unmatched edges. Hence, we can again work with only one distance function for the purpose of computing . As the term is common in all distances, we also drop that, and work with our old distance function.

In Section 6, we show how to construct and maintain Hungarian search data structure so that the task of maintaining closest pair can be performed in total time. Moreover, the data structure uses construction time and space.

Using the MATCH and Hungarian search data structures, one can implement the Gabow-Tarjan algorithm on the instance in time , i.e., a minimum cost bipartite perfect matching in can be computed in time .

## 6 Data Structures

### 6.1 The MATCH Data Structure

We would like to construct a data structure where given a matching at a fixed scale, a maximal set of augmenting paths can be computed efficiently. In particular, we are given the graph , where contains copies of the vertices in , contains copies of the vertices in . , where , , , and . Let . Also, note that each edge weight is an integer for , where is a constant. Let be the set of edges in with weights . We define a bi-clique cover for each as a collection where , , all the edges in are in and . The size of is . Given the points in and , using standard range searching data structures, one can compute such a bi-clique cover of size in time. We note that bi-clique covers are also used for the algorithm in [18]. Let . Thus can be computed in time and space.

In MATCH we store the bi-clique covers in corresponding to the edges in . Also, we store the edges in along with their weights in an array indexed by the vertices of and the edges in and their weights in an array indexed by the vertices of . For finding an augmenting path efficiently, we need to store additional information. Before describing that, we describe in more detail how the augmenting paths are found.

The maximal set of augmenting paths are found by a careful implementation of depth first search. In this implementation, vertices can be labeled as marked. Initially all vertices are unmarked. We select any free unmarked vertex of and initialize a path at that vertex. is extended from the last vertex (in as an invariant) as follows. We probe an eligible edge . If is already marked, the next eligible edge is considered. If no such edge exists, the last two edges (one unmatched and one matched) are deleted from . If becomes empty, a new path is initialized. For the remaining cases, the following subroutine is called.

#### Augmenting_Path(v).

If is unmarked and free, we have found an augmenting path; is marked, is added to and a new path is initialized. In this case, return DONE. If is unmarked, but matched with another vertex , and are added to ; are marked and the extension continues from (in ).

[] In the above procedure, we always maintain the invariant that when we extend a path it suffices to probe only unmatched edges.

###### Proof.

In AUGMENTINGPATH, we always maintain the invariants that when we extend a path from the last vertex , (i) and (ii) if is not the first vertex of , the last edge is a matched edge.

The above claim helps us restrict the search from a vertex of . Now, whenever we extend the path from the last vertex , there are two cases. Either contains a single vertex , and in this case is free; we probe an unmatched eligible edge . Or, contains at least two edges, and in this case the last edge of is a matched edge by the above observation; we need to probe an unmatched eligible edge . Thus, it suffices to probe only unmatched edges for the purpose of extension of . ∎

Note that in the above procedure we cannot afford to probe all the unmatched edges. So, we have to implement the above step carefully. First, note that an edge () is never scanned twice. When is probed the first time, if is unmarked, it becomes marked in all the cases. Also, once is marked, is never used to extend . Thus, we can eliminate from further probing.

From the above discussion, as and contain edges in total they can be probed in time. However, and contain edges and thus for probing them we need specialized data structures. Next, we describe those.

Let be the weight of at the current scale. is the dual value of the vertex which remains fixed throughout the augmenting paths finding process. Again consider the bi-clique covers in . For each , denotes the weight of the edges in -th class. Also, let be the weight class to which the edges in belong, i.e., all of their weights are . For each such and , we store in match the vertices of in a Red-Black tree with as the key of each such vertex . Moreover, for each , we keep an ordered set of indexes . Similarly, define the index set for . We also store the vertices of in a Red-Black tree with as the key of .

MATCH uses construction time and space.

The above space bound follows from the fact that the space complexity of MATCH is dominated by the space needed for the Red-Black trees, which is , as a Red-Black tree uses linear space. The time bound follows trivially.

#### Find_Maximal_Aps.

Let be the set of free vertices and be the set of vertices that are already marked. Initially . Let be the set of augmenting paths found so far, which is initialized to . For each vertex , set its -failed flag to 0. While there is a vertex , do the following.

• Initialize a path at .

• While is not empty, do the following.

• Let be the current augmenting path that we need to extend.

• (Case 1. ) Access the array to find whether the copy of in , i.e., , is marked and is eligible.

• If is unmarked and is eligible, call the subroutine AUGMENTINGPATH (). If this subroutine returns DONE, terminate this while loop. Otherwise, jump to the next iteration.

• Otherwise, search the Red-Black tree where is the first index in , to find a vertex with key value . If such a vertex is found, call the subroutine AUGMENTINGPATH(). Remove from all the Red-Black trees with indexes in , as it is marked in the subroutine. If this subroutine returns DONE, terminate this while loop. Otherwise, jump to the next iteration. If no such vertex is found in , remove the index from and repeat the above step (performed for ) for the next index in . If becomes empty, remove and from , and continue to the next iteration.

• (Case 2. ) Access the array to find whether the original copy of in , say , is marked and is eligible. If is unmarked and is eligible, call the subroutine AUGMENTINGPATH(). If this subroutine returns DONE, terminate this while loop. Otherwise, jump to the next iteration. If -failed is not set, i.e., it is 0 for , search the Red-Black tree to find a vertex with key value . If such a vertex is found, call the subroutine AUGMENTINGPATH(). Remove from , as it is marked in the subroutine. If this subroutine returns DONE, terminate this while loop. Otherwise, jump to the next iteration. If no such vertex is found in , set -failed flag for to 1, remove and from , and continue to the next iteration.

The above procedure is self-explanatory. Next, we prove its correctness and bound the implementation time.

[] FINDMAXIMALAPS correctly computes a maximal set of disjoint augmenting paths.

###### Proof.

The proof of the lemma follows from the proof of correctness of the Gabow-Tarjan algorithm, assuming all eligible edges are considered if needed, while extending the path from . Consider any eligible edge . Here we show that is considered for extension of . If is marked, was already included at some augmenting path. As we are looking for disjoint paths, we do not need to consider the edge , and hence the correctness follows in this case. Thus, wlog, we can assume that is unmarked. First, assume . Then if , is explicitly considered by the subroutine. Otherwise, . In this case, suppose be an index such that . Note that initially is in . Thus, if needed, can be searched, and as is unmarked, . Now, as is eligible and unmatched (by Observation 6.1), . Thus, the key of , must be equal to and can be found eventually while searching . Now, assume . Then if , again is explicitly considered by the subroutine. Otherwise, . In this case, is in the tree , as is unmarked. Now, as is eligible and unmatched (by Observation 6.1), . Thus, the key of , must be equal to and can be found eventually while searching . ∎

[] FINDMAXIMALAPS can be implemented in time.

###### Proof.

For any vertex , let be the total time spent over the course of the procedure to extend paths that end at , in particular, to find eligible edges of the form with . Also, for , let be the total time spent to remove from the Red-Black tree(s) once it is marked. Note that the time of FINDMAXIMALAPS is dominated by the time . Let us analyze at first. In the first case, . As mentioned before, probing of the edges in can be done in time, using the arrays and . So, let us focus on the remaining edges. Note that when we try to find an eligible edge for , we search in , where . Either no desired vertex is found, in which case, is removed from . Otherwise, a vertex is found, but it is deleted from all Red-Black trees, and so it cannot appear in future search. Thus, removal of indexes from and failed searches cost time in total. The total number of vertices in that successfully appear in the result of the searches corresponding to all free vertices in is . Hence,

 ∑u∈R0Lu=∑u∈R0O(|I(u)|logn)+∑(i,j)O(|Qij|logn)=∑(i,j)O((|Pij|+|Qij|)logn).

The last inequality follows, as the sum of the sizes of the index sets in is bounded by the sum of the sizes of the sets in .

Now, consider the case when . In this case, when we try to find an eligible edge for with , we search in . Either no such is found, in which case, -failed flag for is set to 1, and is never searched w.r.t. again. Or, such a vertex is found, in which case, it is deleted from