1 Introduction
Matching in graphs is one of the most wellstudied problems in economics and computer science. This is an ubiquitous problem with application areas ranging from transportation networks Aggarwal et al. (1995) to social networks Chu et al. (2013), and computational chemistry Frohlich et al. (2005). For instance, in the case of social networks, tag suggestions and localization problems for images can be transformed into a weighted bipartite graph matching problems Chu et al. (2013)
. Similarly, the processes of deanonymization and privacy inference in social networks can be reduced to finding the maximum weighted bipartite matching of the corresponding knowledge graph
Qian et al. (2016).Given a graph , a matching is a subset of edges such that no two edges in share a common vertex. The matching is perfect if every belongs to an edge of . For weighted graphs, it is often required to compute a perfect matching that is optimal with respect to some criterion. Given a real weight for each edge a minimum cost matching (MCM) minimizes among all feasible perfect matchings for Similarly, a bottleneck matching (BM) minimizes while a uniform matching (UM) minimizes A minimum deviation matching (MDM) minimizes
A plethora of work has been done to develop efficient algorithms for obtaining optimal solutions for various matchings. However, for many realworld problems with larger graphs Beier and Sibeyn (2000); Monien et al. (2005), the running times of the fastest available matching algorithms are too costly. For example, the best known algorithm for the minimum cost matching problem runs in time for a dense graph with vertices Kuhn (1955). For massive graphs, this is quite inefficient. Thus designing efficient approximate algorithms permit the solution of large instances of matching problems that arise in practical situations.
Graph Type  Matching Type  Algo.  Algo. Type  Complexity 

Bipartite  Minimum Cost (MCM)  Kuhn et al. Kuhn (1955)  NonEuclidean  
Agarwal et al. Agarwal et al. (1995)  Euclidean  
Bipartite  Bottleneck (BM)  Punnen et al. Punnen and Nair (1994)  NonEuclidean  
Efrat et al. Efrat et al. (2001)  Euclidean  
Bipartite  Uniform (UM)  Martello et al. Martello et al. (1984)  NonEuclidean  
Efrat et al. Efrat et al. (2001)  Euclidean  
Bipartite  Minimum Deviation (MDM)  Efrat et al. Efrat (1998)  NonEuclidean  
Efrat et al. Efrat et al. (2001)  Euclidean  
Nonbipartite  Minimum Cost (MCM)  Gabow et al. Gabow (1990)  NonEuclidean  
Varadarajan et al. Varadarajan (1998)  Euclidean  
Nonbipartite  Bottleneck (BM)  Gabow et al. Gabow and Tarjan (1988)  NonEuclidean  
Efrat et al. Efrat and Katz (2000)  Euclidean 
There exist many constant factor approximation algorithms for the maximum weight matching (MWM) problem. However, there is no such algorithm for the minimum cost matching (MCM) problem. The greedy heuristic for the MCM problem, attempts to construct a minimum cost perfect matching by starting with an empty matching and iteratively adding a minimum weight edge between two exposed nodes. While the running time complexities of various greedy matching algorithms are generally linear in terms of the number of edges (), the corresponding approximation ratios are high. The greedy heuristic runs in time and finds a solution with cost at most times the optimum cost Reingold and Tarjan (1981). Grigoriadis and Kalantari (1988) developed an heuristic that constructs a matching with cost at most times the optimum cost. Thus one should focus on designing efficient approximate algorithms with good performance guarantees.
While for nonEuclidean graphs the running time complexities of optimal matching algorithms are high, the available optimal matching algorithms are substantially faster for the Euclidean case, i.e. when the vertices of the graph are point sets in and edge weights corresponds to euclidean distances. For example, the best known algorithm for the bottleneck matching problem runs in time for a bipartite Euclidean graph as compared to an for its nonEuclidean counterpart. Thus the following natural question arises. Can we leverage Euclidean matching techniques to obtain nearoptimal solutions to nonEuclidean matching problems?
In this work, we propose a network embedding based algorithm to obtain approximate solutions to nonEuclidean matching problems. More precisely, using existing linear time network embedding techniques, we embed the vertices of the nonEuclidean graph into points in such that the neighborhood of the vertices are approximately preserved. We then run faster available Euclidean matching algorithms on the embedded vertices. To the best of our knowledge, this is the first work that applies network embedding to solve various matching problems. Empirical results show the efficacy of our proposed algorithm.
2 Technical Preliminaries
We consider the problem of finding approximate solutions to optimal matching problem in complete bipartite and general graphs^{1}^{1}1We assume the graph to be sufficiently dense for the case when it is not complete. In such a case, we can add edges each with infinite costs to the graph to make it complete without affecting the overall running time complexity.. We denote the graph as with Here, denotes the vertex set and denotes the edge set. We assign a real weight to each edge A matching is defined to be a set of edges such that no vertex of is incident to more than one edge of . Denote and Our goal is to compute a perfect matching that is nearoptimal with respect to optimality criterion defined in the previous Section.
2.1 Geometry Helps in Matching
A summary of the running time complexities of various optimal matching algorithms in Euclidean and nonEuclidean setting is given in Table 1. Note that, bipartite matching is a special case of general graph matching. Computing an optimal bipartite matching is more challenging than computing an optimal matching on a complete nonbipartite graph in a Euclidean setting. For example, MCM on a set of points can be computed in time, while the best known algorithm for computing MCM between two point sets of size each takes time. Also note that, there is considerable amount of savings in terms of running times in the Euclidean setting as compared to the nonEuclidean setting. For instance, MCM on a nonEuclidean nonbipartite complete graph runs in time where as MCM on a Euclidean nonbipartite complete graph runs in time. Similar savings can be observed for many other matching problems, such as BM, UM and MDM. Below we transform the nonEuclidean matching problem into a Euclidean one through network embedding.
3 Our Approach: Matching through Embedding
Given a graph , we aim to learn a representation of
in a lowdimensional vector space
, i.e., find a map such that the neighborhoods of nodes are approximately preserved. This allows us to execute matching algorithms on the set of vectors instead of doing that on itself. Below we present two network embedding techniques available from literature.
Deep Walk Perozzi and Skiena (2014): As a homogeneous network embedding method, DeepWalk performs uniform random walks to get a corpus of vertex sequences. Then the word2vec is applied on the corpus to learn vertex embeddings.

Node2Vec Grover and Leskovec (2016): This method extends DeepWalk by performing biased random walks to generate the corpus of vertex sequences. The hyperparameters and can be set to different values.
We construct the embedded graph as follows. We let Also, we let We then define weight of an edge as Clearly is a Euclidean graph in dimension We then apply available optimal Euclidean matching algorithms on to obtain a matching We output as the approximate solution to the optimal matching problem on
Remark 1.
Note that, the above mentioned embedding techniques can learn the representation of in time1. Thus the overall time complexity of the proposed algorithm is equal to that of the corresponding Euclidean matching algorithm.
4 Empirical Results
In this section we present experiments on several synthetic datasets.
For synthetic datasets, we vary the distributions of edge weights. We compare the performance of our proposed heuristic algorithm to that of the greedy heuristic under two models based on generating edge weights. First, we consider an adversarial model. Under this model, edge weights are generated according to Reingold and Tarjan (1981) and as shown in Figure 1. We set with being the number of clusters of nodes and the largest distance between adjacent nodes is set to Under the adversarial model, the greedy heuristic finds a solution with cost times the optimum cost Reingold and Tarjan (1981). Next, we consider the Lomax distribution (a long tail distribution) Lomax (1954) for generating the weights since they are widely used in realworld applications Wang et al. (2019). We chose the Lomax distribution parameter from the set . The parameter settings are motivated by Wang et al. (2019); Grover and Leskovec (2016). We set the number of walks per node and length of each random walk to for both deepWalk and node2vec. The embedding number of dimensions, return () and inout parameter () are set to and respectively. The experiments are repeated times for each parameter setting and the confidence interval plots are reported. If not mentioned, in our experiments, n = 100 and m = 4950 for nonbipartite graphs. We focus on MCM and BM algorithms in nonbipartite setting. We get similar results for bipartite graphs and hence omit them here.
4.1 Effect of Edge Weight Distribution
We first present the results for MCM and BM under adversarial model as shown in Figures 2(a) and (b) respectively. We plot the approximation ratios of various matching algorithms as a function of number of nodes in the graph. For both MCM and BM, the proposed node2vec and deepWalk based algorithms yield lower approximation ratio, hence better performance compared to greedy heuristic. Also, as the number of nodes increases, the performance of greedy heuristic decreases while that of embedding based techniques increases. One possible explanation for better performance of the embedding based techniques is due to their ability to learn the inherent onedimensional structure of the adversarial model. However, the nature of the plots are reversed for MCM under the Lomax distribution model as shown Figure 2 (c). In this case, greedy beats both deepWalk and node2vec based algorithms. Note that, as the Lomax distribution parameter increases, both embedding based techniques gradually perform better and finally converge to the greedy solution.
4.2 Parameter Sensitivity
The embedding based matching algorithms involve a number of parameters. Below, we examine the effect of different parameters on the overall performance of embedding based techniques under adversarial model as shown in Figures 2 (d)(f). All other parameters assume default values, except for the parameter being examined. We measure the approximation ratio as a function of number and length of walks per node, number of dimensions. The performance of both node2vec and deepWalk increase with the number, length of walks per node and number of dimensions as expected. Again, node2vec beats deepWalk across all values of parameters.
5 Conclusion
In this paper, we developed approximate solutions for various matching problems on dense graphs. More precisely, we proposed a network embedding based heuristic algorithm using existing network embedding techniques. We also performed simulations on synthetic datasets to obtain comparison results for empirical approximation ratios across different proposed and existing matching algorithms. Future directions include to consider the effect of other nonrandom walk based embedding techniques on the overall performance of the proposed algorithm.
6 Acknowledgment
This research was sponsored by the U.S. ARL and the U.K. MoD under Agreement Number W911NF1630001 and by the NSF under Grant CNS1617437. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the National Science Foundation, U.S. ARL or the U.K. MoD. This document does not contain technology or technical data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations.
References
 [1] Cited by: Remark 1.
 [2] (1995) Vertical decomposition of shallow levels in 3dimensional arrangements and its applications. In Proc. 11th Annu. ACMSympos. Comput. Geom., pp. 39–50. Cited by: Table 1.
 [3] (1995) Efficient minimum cost matching using quadrangle inequality. Journal of Algorithms 19, pp. 116–143. Cited by: §1.
 [4] (2000) Quality Matching and Local Improvement for Multilevel Graph Partitioning. Parallel Computing 26, pp. 1609–1634. Cited by: §1.
 [5] (2013) Tag suggestion and localization for images by bipartite graph matching. 2013 AsiaPacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013. External Links: Document, ISBN 9789869000604 Cited by: §1.
 [6] (2001) Geometry helps in bottleneck matching and related problems. Algorithmica 31, pp. 1–28. Cited by: Table 1.
 [7] (2000) Computing euclidean bottleneck matchings in higher dimensions. Information Processing Letters 75, pp. 169–174. Cited by: Table 1.
 [8] (1998) Geometric Location Optimization. Ph.D. Dissertion, TelAviv University. Cited by: Table 1.
 [9] (2005) Assignment kernels for chemical compounds. IJCNN. Cited by: §1.
 [10] (1988) Algorithms for Two Bottleneck Optimization Problems. J. Algorithms 9, pp. 411–417. Cited by: Table 1.
 [11] (1990) Data Structures for Weighted Matching and Nearest Common Ancestors with Linking. Proceedings of the First Annual ACMSIAM Symposium on Discrete Algorithms, pp. 434–443. Cited by: Table 1.
 [12] (1988) A new class of heuristic algorithms for weighted perfect matching. J. Assoc. Comput. Mach. 35, pp. 769–776. Cited by: §1.
 [13] (2016) node2vec: Scalable Feature Learning for Networks. KDD, pp. 855–864. Cited by: 2nd item, §4.
 [14] (1955) The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly 2, pp. 83–97. Cited by: Table 1, §1.
 [15] (1954) Business Failures; Another example of the analysis of failure data. Journal of the American Statistical Association 49, pp. 847–852. Cited by: §4.
 [16] (1984) Balanced optimization problems. Operations Research Letters 3, pp. 275–278. Cited by: Table 1.
 [17] (2005) Assignment kernels for chemical compounds. IJCNN. Cited by: §1.
 [18] (2014) DeepWalk: Online Learning of Social Representations. KDD. Cited by: 1st item.
 [19] (1994) Improved Complexity Bound for the Maximum Cardinality Bottleneck Bipartite Matching Problem. Discrete Applied Mathematics 55, pp. 91–93. Cited by: Table 1.
 [20] (2016) Deanonymizing social networks and inferring private attributes using knowledge graphs. Proceedings  IEEE INFOCOM 2016July. External Links: Document, ISBN 9781467399531, ISSN 0743166X Cited by: §1.
 [21] (1981) On a greedy heuristic for complete matching. SIAM J. Comput. 10, pp. 676–681. Cited by: §1, Figure 1, §4.
 [22] (1998) A divideandconquer algorithm for mincost perfect matching in the plane. In Proc. FOCS. Cited by: Table 1.

[23]
(2019)
Adaptive dynamic bipartite graph matching: A reinforcement learning approach
. Proceedings  International Conference on Data Engineering 2019April, pp. 1478–1489. External Links: Document, ISBN 9781538674741, ISSN 10844627 Cited by: §4.
Comments
There are no comments yet.