Matching through Embedding in Dense Graphs

11/13/2020 ∙ by Nitish K. Panigrahy, et al. ∙ University of Massachusetts Amherst Raytheon 0

Finding optimal matchings in dense graphs is of general interest and of particular importance in social, transportation and biological networks. While developing optimal solutions for various matching problems is important, the running times of the fastest available optimal matching algorithms are too costly. However, when the vertices of the graphs are point-sets in R^d and edge weights correspond to the euclidean distances, the available optimal matching algorithms are substantially faster. In this paper, we propose a novel network embedding based heuristic algorithm to solve various matching problems in dense graphs. In particular, using existing network embedding techniques, we first find a low dimensional representation of the graph vertices in R^d and then run faster available matching algorithms on the embedded vertices. To the best of our knowledge, this is the first work that applies network embedding to solve various matching problems. Experimental results validate the efficacy of our proposed algorithm.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Matching in graphs is one of the most well-studied problems in economics and computer science. This is an ubiquitous problem with application areas ranging from transportation networks Aggarwal et al. (1995) to social networks Chu et al. (2013), and computational chemistry Frohlich et al. (2005). For instance, in the case of social networks, tag suggestions and localization problems for images can be transformed into a weighted bipartite graph matching problems Chu et al. (2013)

. Similarly, the processes of de-anonymization and privacy inference in social networks can be reduced to finding the maximum weighted bipartite matching of the corresponding knowledge graph

Qian et al. (2016).

Given a graph , a matching is a subset of edges such that no two edges in share a common vertex. The matching is perfect if every belongs to an edge of . For weighted graphs, it is often required to compute a perfect matching that is optimal with respect to some criterion. Given a real weight for each edge a minimum cost matching (MCM) minimizes among all feasible perfect matchings for Similarly, a bottleneck matching (BM) minimizes while a uniform matching (UM) minimizes A minimum deviation matching (MDM) minimizes

A plethora of work has been done to develop efficient algorithms for obtaining optimal solutions for various matchings. However, for many real-world problems with larger graphs Beier and Sibeyn (2000); Monien et al. (2005), the running times of the fastest available matching algorithms are too costly. For example, the best known algorithm for the minimum cost matching problem runs in time for a dense graph with vertices Kuhn (1955). For massive graphs, this is quite inefficient. Thus designing efficient approximate algorithms permit the solution of large instances of matching problems that arise in practical situations.

Graph Type Matching Type Algo. Algo. Type Complexity
Bipartite Minimum Cost (MCM) Kuhn et al. Kuhn (1955) Non-Euclidean
Agarwal et al. Agarwal et al. (1995) Euclidean-
Bipartite Bottleneck (BM) Punnen et al. Punnen and Nair (1994) Non-Euclidean
Efrat et al. Efrat et al. (2001) Euclidean-
Bipartite Uniform (UM) Martello et al. Martello et al. (1984) Non-Euclidean
Efrat et al. Efrat et al. (2001) Euclidean
Bipartite Minimum Deviation (MDM) Efrat et al. Efrat (1998) Non-Euclidean
Efrat et al. Efrat et al. (2001) Euclidean
Non-bipartite Minimum Cost (MCM) Gabow et al. Gabow (1990) Non-Euclidean
Varadarajan et al. Varadarajan (1998) Euclidean-
Non-bipartite Bottleneck (BM) Gabow et al. Gabow and Tarjan (1988) Non-Euclidean
Efrat et al. Efrat and Katz (2000) Euclidean-
Table 1: Running time complexities of various optimal matching algorithms.

There exist many constant factor approximation algorithms for the maximum weight matching (MWM) problem. However, there is no such algorithm for the minimum cost matching (MCM) problem. The greedy heuristic for the MCM problem, attempts to construct a minimum cost perfect matching by starting with an empty matching and iteratively adding a minimum weight edge between two exposed nodes. While the running time complexities of various greedy matching algorithms are generally linear in terms of the number of edges (), the corresponding approximation ratios are high. The greedy heuristic runs in time and finds a solution with cost at most times the optimum cost Reingold and Tarjan (1981). Grigoriadis and Kalantari (1988) developed an heuristic that constructs a matching with cost at most times the optimum cost. Thus one should focus on designing efficient approximate algorithms with good performance guarantees.

While for non-Euclidean graphs the running time complexities of optimal matching algorithms are high, the available optimal matching algorithms are substantially faster for the Euclidean case, i.e. when the vertices of the graph are point sets in and edge weights corresponds to euclidean distances. For example, the best known algorithm for the bottleneck matching problem runs in time for a bipartite Euclidean graph as compared to an for its non-Euclidean counterpart. Thus the following natural question arises. Can we leverage Euclidean matching techniques to obtain near-optimal solutions to non-Euclidean matching problems?

In this work, we propose a network embedding based algorithm to obtain approximate solutions to non-Euclidean matching problems. More precisely, using existing linear time network embedding techniques, we embed the vertices of the non-Euclidean graph into points in such that the neighborhood of the vertices are approximately preserved. We then run faster available Euclidean matching algorithms on the embedded vertices. To the best of our knowledge, this is the first work that applies network embedding to solve various matching problems. Empirical results show the efficacy of our proposed algorithm.

2 Technical Preliminaries

We consider the problem of finding approximate solutions to optimal matching problem in complete bipartite and general graphs111We assume the graph to be sufficiently dense for the case when it is not complete. In such a case, we can add edges each with infinite costs to the graph to make it complete without affecting the overall running time complexity.. We denote the graph as with Here, denotes the vertex set and denotes the edge set. We assign a real weight to each edge A matching is defined to be a set of edges such that no vertex of is incident to more than one edge of . Denote and Our goal is to compute a perfect matching that is near-optimal with respect to optimality criterion defined in the previous Section.

2.1 Geometry Helps in Matching

A summary of the running time complexities of various optimal matching algorithms in Euclidean and non-Euclidean setting is given in Table 1. Note that, bipartite matching is a special case of general graph matching. Computing an optimal bipartite matching is more challenging than computing an optimal matching on a complete non-bipartite graph in a Euclidean setting. For example, MCM on a set of points can be computed in time, while the best known algorithm for computing MCM between two point sets of size each takes time. Also note that, there is considerable amount of savings in terms of running times in the Euclidean setting as compared to the non-Euclidean setting. For instance, MCM on a non-Euclidean non-bipartite complete graph runs in time where as MCM on a Euclidean non-bipartite complete graph runs in time. Similar savings can be observed for many other matching problems, such as BM, UM and MDM. Below we transform the non-Euclidean matching problem into a Euclidean one through network embedding.

3 Our Approach: Matching through Embedding

Given a graph , we aim to learn a representation of

in a low-dimensional vector space

, i.e., find a map such that the neighborhoods of nodes are approximately preserved. This allows us to execute matching algorithms on the set of vectors instead of doing that on itself. Below we present two network embedding techniques available from literature.

  • Deep Walk Perozzi and Skiena (2014): As a homogeneous network embedding method, DeepWalk performs uniform random walks to get a corpus of vertex sequences. Then the word2vec is applied on the corpus to learn vertex embeddings.

  • Node2Vec Grover and Leskovec (2016): This method extends DeepWalk by performing biased random walks to generate the corpus of vertex sequences. The hyper-parameters and can be set to different values.

We construct the embedded graph as follows. We let Also, we let We then define weight of an edge as Clearly is a Euclidean graph in dimension We then apply available optimal Euclidean matching algorithms on to obtain a matching We output as the approximate solution to the optimal matching problem on

Remark 1.

Note that, the above mentioned embedding techniques can learn the representation of in time1. Thus the overall time complexity of the proposed algorithm is equal to that of the corresponding Euclidean matching algorithm.

4 Empirical Results

In this section we present experiments on several synthetic datasets.

Figure 1: Adversarial model for assigning edge weights with Reingold and Tarjan (1981).
(a)
(b)
(c)
(d)
(e)
(f)
Figure 2: Performance comparison of our proposed algorithm to that of greedy for (a) MCM (b) BM under adversarial model and (c) MCM under lomax model. (d)-(f) Parameter sensitivity of the proposed algorithm under adversarial model.

For synthetic datasets, we vary the distributions of edge weights. We compare the performance of our proposed heuristic algorithm to that of the greedy heuristic under two models based on generating edge weights. First, we consider an adversarial model. Under this model, edge weights are generated according to Reingold and Tarjan (1981) and as shown in Figure 1. We set with being the number of clusters of nodes and the largest distance between adjacent nodes is set to Under the adversarial model, the greedy heuristic finds a solution with cost times the optimum cost Reingold and Tarjan (1981). Next, we consider the Lomax distribution (a long tail distribution) Lomax (1954) for generating the weights since they are widely used in real-world applications Wang et al. (2019). We chose the Lomax distribution parameter from the set . The parameter settings are motivated by Wang et al. (2019); Grover and Leskovec (2016). We set the number of walks per node and length of each random walk to for both deepWalk and node2vec. The embedding number of dimensions, return () and in-out parameter () are set to and respectively. The experiments are repeated times for each parameter setting and the confidence interval plots are reported. If not mentioned, in our experiments, n = 100 and m = 4950 for non-bipartite graphs. We focus on MCM and BM algorithms in non-bipartite setting. We get similar results for bipartite graphs and hence omit them here.

4.1 Effect of Edge Weight Distribution

We first present the results for MCM and BM under adversarial model as shown in Figures 2(a) and (b) respectively. We plot the approximation ratios of various matching algorithms as a function of number of nodes in the graph. For both MCM and BM, the proposed node2vec and deepWalk based algorithms yield lower approximation ratio, hence better performance compared to greedy heuristic. Also, as the number of nodes increases, the performance of greedy heuristic decreases while that of embedding based techniques increases. One possible explanation for better performance of the embedding based techniques is due to their ability to learn the inherent one-dimensional structure of the adversarial model. However, the nature of the plots are reversed for MCM under the Lomax distribution model as shown Figure 2 (c). In this case, greedy beats both deepWalk and node2vec based algorithms. Note that, as the Lomax distribution parameter increases, both embedding based techniques gradually perform better and finally converge to the greedy solution.

4.2 Parameter Sensitivity

The embedding based matching algorithms involve a number of parameters. Below, we examine the effect of different parameters on the overall performance of embedding based techniques under adversarial model as shown in Figures 2 (d)-(f). All other parameters assume default values, except for the parameter being examined. We measure the approximation ratio as a function of number and length of walks per node, number of dimensions. The performance of both node2vec and deepWalk increase with the number, length of walks per node and number of dimensions as expected. Again, node2vec beats deepWalk across all values of parameters.

5 Conclusion

In this paper, we developed approximate solutions for various matching problems on dense graphs. More precisely, we proposed a network embedding based heuristic algorithm using existing network embedding techniques. We also performed simulations on synthetic datasets to obtain comparison results for empirical approximation ratios across different proposed and existing matching algorithms. Future directions include to consider the effect of other non-random walk based embedding techniques on the overall performance of the proposed algorithm.

6 Acknowledgment

This research was sponsored by the U.S. ARL and the U.K. MoD under Agreement Number W911NF-16-3-0001 and by the NSF under Grant CNS-1617437. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the National Science Foundation, U.S. ARL or the U.K. MoD. This document does not contain technology or technical data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations.

References

  • [1] Cited by: Remark 1.
  • [2] P. K. Agarwal, A. Efrat, and M. Sharir (1995) Vertical decomposition of shallow levels in 3-dimensional arrangements and its applications. In Proc. 11th Annu. ACMSympos. Comput. Geom., pp. 39–50. Cited by: Table 1.
  • [3] A. Aggarwal, A. Barnoy, S. Khuller, D. Kravets, and B. Schieber (1995) Efficient minimum cost matching using quadrangle inequality. Journal of Algorithms 19, pp. 116–143. Cited by: §1.
  • [4] R. Beier and J. F. Sibeyn (2000) Quality Matching and Local Improvement for Multilevel Graph- Partitioning. Parallel Computing 26, pp. 1609–1634. Cited by: §1.
  • [5] W. T. Chu, C. J. Li, and J. Y. Yu (2013) Tag suggestion and localization for images by bipartite graph matching. 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013. External Links: Document, ISBN 9789869000604 Cited by: §1.
  • [6] A. Efrat, M. Itai, and M. Katz (2001) Geometry helps in bottleneck matching and related problems. Algorithmica 31, pp. 1–28. Cited by: Table 1.
  • [7] A. Efrat and M. Katz (2000) Computing euclidean bottleneck matchings in higher dimensions. Information Processing Letters 75, pp. 169–174. Cited by: Table 1.
  • [8] A. Efrat (1998) Geometric Location Optimization. Ph.D. Dissertion, Tel-Aviv University. Cited by: Table 1.
  • [9] H. Frohlich, J. K. Wegner, and A. Zell (2005) Assignment kernels for chemical compounds. IJCNN. Cited by: §1.
  • [10] H. N. Gabow and R. E. Tarjan (1988) Algorithms for Two Bottleneck Optimization Problems. J. Algorithms 9, pp. 411–417. Cited by: Table 1.
  • [11] H. N. Gabow (1990) Data Structures for Weighted Matching and Nearest Common Ancestors with Linking. Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 434–443. Cited by: Table 1.
  • [12] M. D. Grigoriadis and B. Kalantari (1988) A new class of heuristic algorithms for weighted perfect matching. J. Assoc. Comput. Mach. 35, pp. 769–776. Cited by: §1.
  • [13] A. Grover and J. Leskovec (2016) node2vec: Scalable Feature Learning for Networks. KDD, pp. 855–864. Cited by: 2nd item, §4.
  • [14] H. W. Kuhn (1955) The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly 2, pp. 83–97. Cited by: Table 1, §1.
  • [15] K. S. Lomax (1954) Business Failures; Another example of the analysis of failure data. Journal of the American Statistical Association 49, pp. 847–852. Cited by: §4.
  • [16] S. Martello, W. R. Pulleyblank, P. Toth, and D. Werra (1984) Balanced optimization problems. Operations Research Letters 3, pp. 275–278. Cited by: Table 1.
  • [17] B. Monien, R. Preis, and R. Diekmann (2005) Assignment kernels for chemical compounds. IJCNN. Cited by: §1.
  • [18] B. Perozzi and S. Skiena (2014) DeepWalk: Online Learning of Social Representations. KDD. Cited by: 1st item.
  • [19] A. P. Punnen and K. P. Nair (1994) Improved Complexity Bound for the Maximum Cardinality Bottleneck Bipartite Matching Problem. Discrete Applied Mathematics 55, pp. 91–93. Cited by: Table 1.
  • [20] J. Qian, X. Y. Li, C. Zhang, and L. Chen (2016) De-anonymizing social networks and inferring private attributes using knowledge graphs. Proceedings - IEEE INFOCOM 2016-July. External Links: Document, ISBN 9781467399531, ISSN 0743166X Cited by: §1.
  • [21] E. M. Reingold and R. E. Tarjan (1981) On a greedy heuristic for complete matching. SIAM J. Comput. 10, pp. 676–681. Cited by: §1, Figure 1, §4.
  • [22] K. R. Varadarajan (1998) A divide-and-conquer algorithm for min-cost perfect matching in the plane. In Proc. FOCS. Cited by: Table 1.
  • [23] Y. Wang, Y. Tong, C. Long, P. Xu, K. Xu, and W. Lv (2019)

    Adaptive dynamic bipartite graph matching: A reinforcement learning approach

    .
    Proceedings - International Conference on Data Engineering 2019-April, pp. 1478–1489. External Links: Document, ISBN 9781538674741, ISSN 10844627 Cited by: §4.