The problem of assigning elements of one set to elements of another set is motivated by important real-world scenarios like assigning students to universities, applicants to jobs and so on. In many of these applications, members of one or both the sets rank each other in an order of preference. The goal is to compute an assignment that is “optimal” with respect to the preferences.
In this paper we focus on the one-sided preference list model where members of one set rank a subset of elements in the other set in a linear order (that is, preferences are assumed to be strict). Several notions of optimality like pareto-optimality, rank-maximality, fairness and popularity have been considered in literature (We give formal definitions of each of these notions later). For each of the above mentioned notions of optimality, there are efficient algorithms studied in the literature to compute the specified optimal matching. [Abraham et al.2004] describe an algorithm that computes a maximum cardinality pareto-optimal matching. [Abraham et al.2007] present an algorithm to compute a popular matching while [Irving et al.2004, Huang et al.2016] propose algorithms that optimize the head/tail of the matching profile (rank-maximal and fair respectively). Maximizing one metric could however result in poor performance on other yardsticks of measure. When comparing two matchings, it is difficult to measure the quality of the two matchings using a single scalar value. They can be compared using a variety of metrics like cardinality, number of matched Rank-1 edges or cardinality, none of which can serve as a sole indicator of optimality.
Profile based matchings, like Rank-maximal or Fair matchings which optimize for the head or the tail of the profile can turn out to be biased under certain circumstances. An alternative is to consider the Area under Profile Curve Ratio metric introduced in [Diebold and Bichler2017]. This metric aims to maximize a measure, that is a weighted sum of matched edges, with the weight proportional to its position in the preference list
In this work, we first present a comprehensive experimental study of the well-studied notions of optimality and compare them using different measures of matching quality. We then describe the AUPCR metric, and propose algorithms to compute an AUPCR maximizing matching, and a maximum cardinality AUPCR maximizing matching.
Finally, we empirically evaluate different matching algorithms on synthetic graphs generated from generator models specified by [Michail2011] using various metrics. The generated graphs fall into two categories, one having uniformly random preference lists and the other having highly correlated preference lists. Our analysis is inspired by the analysis of [Michail2011], and we additionally consider a ranking system in which the matchings are ranked based on multiple metrics. These rankings are consequently aggregated to obtain a single rank, which we use as a coarse indicator of matching quality.
The AUPCR maximizing matching is experimentally shown to have good performance across evaluated metrics on the considered data-sets, and we believe this matching is well suited for practical applications.
Consider a set of applicants and a set of posts. Every applicant has preference list over a subset of the posts in . This list is a linear order (strict list) and is called the preference list of over . The problem is readily represented as a bipartite graph with vertices and an edge is present if is acceptable to . Preferences of applicants are encoded by assigning ranks to edges. Each edge has a rank if considers as its -th most preferred post. A matching is a collection edges such that no two edges share an endpoint. Let and . We now define formally the different notions of optimality.
Maximum Cardinality Pareto Optimal Matching
A matching is said to be Pareto-optimal if there is no other matching such that some applicant is better off in while no applicant is worse off in than in (an applicant is worse of in if it is matched to an less preferred vertex compared to ) . Maximum cardinality Pareto optimal matchings(POM) can be computed in time using the algorithm given by [Abraham et al.2004].
Rank Maximal Matching
The notion of rank-maximal matchings was first introduced by Irving under the name of greedy matchings [Irving2003]. A rank-maximal matching is a matching in which the number of rank one edges is maximized, subject to which the number of rank two edges is maximized and so on. Another way of defining rank-maximal matchings is through their . Given that is the largest rank given to a choice across all preference lists, we define the signature of a matching as an r-tuple where, for , represents the number of applicants matched to one of their -th preferences ( denotes the number of unmatched applicants). Let and denote the signatures of and respectively. We say that w.r.t. rank-maximality if there exists an index such that and for , and . A matching is rank-maximal if has the best signature w.r.t. rank-maximality. Through the rest of the paper, we denote this matching as RMM. For the purposes of our experimental evaluation, we implement a simple combinatorial algorithm [Irving et al.2004] to compute a rank maximal matching. The running time of the algorithm is .
To define popularity, we translate preferences of applicants over posts to preferences of applicants over matchings. An applicant prefers matching to if either is matched in and unmatched in , or is matched in both and but has better rank in than in . A matching is more popular than if the number of applicants who prefer to is more than the number of those who prefer to . A matching is popular if there is no matching that is more popular than . A linear time algorithm to compute a maximum cardinality popular matching for strict preferences is given in [Abraham et al.2007]. The more popular than relation is not transitive, and hence it is possible that a popular matching does not exist. When a popular matching does not exist, one can attempt to obtain the least unpopular matching. We consider the unpopularity factor given in [McCutchen2008]. An algorithm given in [Huang et al.2011] finds a popular matching if it exists. Through the rest of the paper, we denote this matching as POPM.
Fair matchings can be considered as complementary to rank-maximal matchings. A fair matching is always a maximum cardinality matching, subject to this, it matches the least number of applicants their last preferred post, subject to this, least number of applicants to their second last preferred post and so on. Fair matchings can be conveniently defined using signatures. Let and denote the signatures of two matchings and respectively. We say that w.r.t. fairness if there exists a index , such that and for , , and . A matching is fair if it is of maximum cardinality, and subject to that it has the best signature according to the above defined criteria. Recently [Huang et al.2016] gave a combinatorial algorithm to compute fair matchings. Through the rest of the paper, we denote this matching as FM.
AUPCR Maximizing Matching
Fair and Rank-maximal matchings are profile based matchings that are geared towards minimizing the tail or maximizing the head of the profile. However, optimizing for the peripheral portions of a profile may not be necessarily representative of a good matching in many practical settings. This encouraged us to look into a metric called Area Under Profile Curve Ratio (AUPCR) which, in a sense, seemed to capture the entire signature of a matching.
Formulation of AUPCR
The Area Under Profile Curve Ratio (AUPCR), introduced under the context of matchings by [Diebold and Bichler2017] is a measure of second order stochastic dominance of the profile. It is a useful metric that can be used to compare multiple signatures and is very similar in nature to the highly popular Area Under Curve of Receiver Operating Characteristic [Hanley and McNeil1982].
For a matching of a bipartite graph with representing the number of applicants matched to their ’th preference, AUPCR(M) is defined as the ratio of Area Under Profile Curve (AUPC) and Total Area (TA) where
One can visualize this quantity by considering Figure 1. For an instance with , and signature of matching given by , the area under the shaded region corresponds to AUPC() () while the area of bounding rectangle corresponds to TA()(). With these computed, AUPCR() is essentially the ratio of the two and is given by .
A matching that maximizes this measure can be vaguely seen as a ”softer” version of the rank maximal matching: it does not give up matching low ranked edges entirely in order to match a large number of high ranked edges. Based on this we consider two problems:
AUPCR Maximizing Matching - the problem of finding a matching which maximizes the AUPCR metric. We denote such a matching as AMM.
Max Cardinality AMM - the problem of finding a matching with the maximum cardinality among all matchings with maximum AUPCR. We denote such a matching as MC-AMM.
In this paper, we formulate algorithms to address the above defined problems and show that the Max Cardinality AUPCR maximizing matching performs favorably on a variety of other standard metrics typically used to compare matchings in practical settings.
Algorithm - AUPCR Maximizing Matching
The problem of finding an AUPCR maximizing matching can be reduced to the problem of finding a maximum weighted perfect matching. Given a bipartite graph and a weight for each edge , we define the weight of a matching as . Then, the maximum weighted perfect matching problem is find a matching which matches all vertices in ( is a perfect matching) and maximizes .
Given an bipartite graph with edges representing preferences of A, we construct as follows:
and where are copies of and are copies of .
For each edge of rank , add edge between corresponding vertices of and with weight . Similarly, add an edge between and with the same weight.
Add edges with weight 0 from vertices in to their copies in . Add similar edges between and . We refer to these edges as identity edges.
Proof of Correctness
Claim. If is a max weighted perfect matching in , then restricted to is a AMM in .
Let be the matching obtained by restricting to and obtained by restricting to . Since is a perfect matching, all vertices of must be matched. So, if a vertex in is not matched in , it must be matched to its copy in via the identity edge. This means that its copy is also unmatched in . So, and match the same set of vertices. Since the identity edges have 0 weight,
Since and match the same set of vertices, one can copy the edges matched in to . This means that and . Maximizing is equivalent to maximizing .
We also have
where is the rank of edge and is the number of edges of with rank .This means that maximizing maximizes .
Hence, if is a maximum weight perfect matching in , is a max AUPCR matching in . ∎
Algorithm - Max Cardinality AMM
The problem of finding a Max Cardinality AMM can also be reduced to an instance of max weighted perfect matching. The reduction is the same as the max AUPCR case, but we add a negative weight of to the identity edges going from to .
Proof of Correctness
Claim. If is a max weighted perfect matching in , then restricted to is a Max Cardinality AMM in .
As before, we can prove that . However, where is the set of identity edges from to in . If leaves vertices in unmatched and vertices in unmatched, then also leaves the same vertices unmatched. So, we have identity edges in and hence
Since , we have and
Let be a Max AUPCR matching extended to and be its restriction to . Since
From the definition of AUPCR, we can see that if two matchings have different AUPCR, then the difference is . So, and have the same AUPCR, which means that is an AUPCR maximizing matching in .
The cardinality of is . Writing in terms of ,
All AUPCR maximizing matchings will have the same , which means that maximizing maximizes . So, is a maximum cardinality AUPCR maximizing matching in . ∎
The time complexity of the algorithm to find maximum weighted matching presented is [Duan and Su2012]. Since both our algorithms construct a graph with vertices and edges and find a max weighted matching, the time complexity would be .
The matchings obtained from each algorithm are evaluated with respect to the following metrics.
Cardinality: The number of edges present in the matching.
Unpopularity measure: The unpopularity measure measures how far away a matching is from a popular(least unpopular) matching . Let be the number of applicants that prefer over . Then for matching is defined as the ratio of to the total number of applicants.
Rank 1: The number of matched rank 1 edges
AUPCR: The AUPCR metric is second order stochastic dominance of the profile as defined in Equation 3.
Ranks less than half the preference list size (RHPL): This counts the number of applicants who have been matched to a post with a rank better than or equal to half the length of their preference list.
Average rank: For a matching , this is the average rank of all matched edges. Although this is similar to the AUPCR metric, the average rank is computed only over the matched edges while AUPCR accounts for unmatched edges.
Worst rank: For a matching , this is the highest (worst) rank among all matched edges in .
Time: The time taken to find the matching. This is implementation dependent, and the algorithms used have been mentioned earlier along with their time complexities.
For our experiments, we consider two structured instance generators, namely: Highly Correlated and Uniform Random. These generators are similar in nature to [Michail2011], but we consider only instances with strict preference lists. Though all the algorithms described above, except Maximum Cardinality Pareto Optimal, can also handle instances with ties, we went with this choice to have a set of instances upon which all the algorithms could be compared and analyzed. If one thinks about it, this choice is not too restrictive as in practical scenarios preference lists are often strict and devoid of ties.
Uniform Random (UNI)
Similar to HC, UNI instances are also parameterized by a density with . Every applicant has a preference list size of . These preference lists are chosen uniformly at random from the set of permutations of posts. Let an applicant ’s adjacency list be . Then is ranked 1 by , is ranked 2, and so on. Unlike HC, preference list length is identical across all applicants.
Highly Correlated (HC)
These instances are generated based on a global preference ordering(say ) for the set of posts; one that all the applicants agree upon. A HC instance is parameterized by a density with . For every vertex pair with , an edge
is added with a probability. Once the graph has been constructed, the applicants rank the posts as per the global preference list: the best post, as per P, an applicant is connected to is assigned rank 1, and so on.
The number of applicants are equal to the number of posts in any graph and is varied from to in steps of . Orthogonally, the density parameter for HC and UNI is varied from 0.02 to 0.20 in steps 0.02. The reason for this choice of range is that real world datasets are not very dense in nature . Each instance is averaged over 50 random seeds. There exists one more level of averaging across different density() values to get one value for each metric for each problem size(number of applicants).
The variant of Max-AUPCR, that does not not enforce maximum cardinality is used. Surprisingly, this still yields a max-cardinality matching without exception. For POPM, in cases where popular matchings don’t exist, the least unpopular matching is utilized. The code was executed using the Amazon web services(AWS) based EC2 service on a t2.micro instance(1 GB Ram, 1 CPU, Intel Xeon processor).
Comparing matchings based on rank means
For this analysis, we consider a set of the evaluation metrics which we believe characterizes preference matchings in general. For a given metric and graph instance, we rank the algorithms in terms of performance with the best one getting a rank of 1 and worst one getting a rank of 5. We then average this rank across all instances and this value corresponds to an entry in Table 2. The rank mean is computed by taking the average of the entries along the column. This value is intended to serve as a measure of overall performance.
As seen from the table, each chosen metric has a subset of the algorithms performing best. It is however important to note that AMM performs competitively in almost all metrics. This observation is also qualitatively supported from the fact that the rank mean attained by AMM is lowest among all algorithms for both UNI and HC instances. This empirically shows that AMM is able to achieve a much desired balance, making it a very compelling choice for many practical preference matching problems.
Comparing the Matchings on different metrics
Some interesting observations for some metrics are as follows :
Cardinality : As expected, POM and FM have the largest cardinality since they compute maximum cardinality matchings. However, it was observed that AMM without exception returned a maximum cardinality matching. While this may not universally true(as proved in consequent section) this is a useful property in practice
RHPL : The RHPL is one metric that no matching in particular optimizes for. It is peculiar to note that AUPCR maximizes this metric indicating that it is indeed a more general notion of optimality.
Rank 1 : It was observed that both popular and rank maximal matchings have similar if not same number of rank 1 edges. While the head of the signature is maximized, it is observed that both these matchings display poor performances on metrics that account for the entirety or the tail of the signature.
Time : Dictated by the computational time complexities of the respective algorithms, the times were vastly different for FM and AMM compared to the other three matchings. In graphs with 900 vertices(in each partition), the FM took 512.45 seconds,AMM executed in 204.78 seconds while POP and PM were executed in less than 5 seconds.
The strongly positive empirical performance of AMM, in various metrics of importance as shown above, leads us to ask some interesting questions.
Is an AMM Pareto optimal?
Yes, AMM is a Pareto optimal matching.
AUPCR maximizing matching is Pareto optimal.
Assume to the contrary that an AUPCR maximizing matching is not Pareto optimal. This means there exists a matching where every applicant in is at least as well off as in and at least one applicant in is better off than . Consider a vertex . Let be the rank of the post that is matched to ( if is unmatched), and be defined analogously.
The last inequality follows from the fact that every term of the summation is non negative and at least one term is positive by our assumption that is not Pareto optimal.
Since AUPCR() - AUPCR() , is not an AUCPR maximizing matching, a contradiction, and so must be Pareto optimal. ∎
Is an AMM always a maximum cardinality matching?
Do all AMMs have the same cardinality?
All AMMs need not have the same cardinality. Consider the instance with = and = and the preferences given by
As shown in Figure 6 and Figure 7, both are AUPCR maximizing matchings, with an AUPCR of 0.833, but they have different cardinalities. This example also shows that multiple AMMs can exist for a given instance.
Is an AMM always more ”rank maximal” than a FM?
An AMM matching need not be more rank-maximal than the fair matching. Consider the instance with = , = and the preferences given by
In this work, we introduce the notion of an AUPCR maximizing matching. We describe two variants with one maximizing the AUPCR, and the other maximizing the cardinality subject to maximizing the AUPCR. We empirically evaluate our algorithm on standard synthetically generated datasets and highlight that AUPCR maximizing matching achieves this much needed middle-ground with respect to the different notions of optimality. The overall performance of the AUPCR matching is superior in comparison to other matchings when all metrics are cumulatively used for comparison. Extending the AUPCR matching and finding algorithms with reduced time complexity is left as future work.
- [Abraham et al.2004] Abraham, D. J.; Cechlárová, K.; Manlove, D. F.; and Mehlhorn, K. 2004. Pareto optimality in house allocation problems. In International Symposium on Algorithms and Computation, 3–15. Springer.
- [Abraham et al.2007] Abraham, D. J.; Irving, R. W.; Kavitha, T.; and Mehlhorn, K. 2007. Popular matchings. SIAM Journal on Computing 37(4):1030–1045.
- [Diebold and Bichler2017] Diebold, F., and Bichler, M. 2017. Matching with indifferences: A comparison of algorithms in the context of course allocation. European Journal of Operational Research 260(1):268–282.
- [Duan and Su2012] Duan, R., and Su, H.-H. 2012. A scaling algorithm for maximum weight matching in bipartite graphs. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, 1413–1424. SIAM.
- [Hanley and McNeil1982] Hanley, J. A., and McNeil, B. J. 1982. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36.
- [Huang et al.2011] Huang, C.-C.; Kavitha, T.; Michail, D.; and Nasre, M. 2011. Bounded unpopularity matchings. Algorithmica 61(3):738–757.
- [Huang et al.2016] Huang, C.-C.; Kavitha, T.; Mehlhorn, K.; and Michail, D. 2016. Fair matchings and related problems. Algorithmica 74(3):1184–1203.
- [Irving et al.2004] Irving, R. W.; Kavitha, T.; Mehlhorn, K.; Michail, D.; and Paluch, K. 2004. Rank-maximal matchings. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, 68–75. Society for Industrial and Applied Mathematics.
- [Irving2003] Irving, R. W. 2003. Greedy matchings. Technical R eport Tr-2003-136, University of G lasgow.
- [McCutchen2008] McCutchen, R. 2008. The least-unpopularity-factor and least-unpopularity-margin criteria for matching problems with one-sided preferences. LATIN 2008: Theoretical Informatics 593–604.
- [Michail2011] Michail, D. 2011. An experimental comparison of single-sided preference matching algorithms. Journal of Experimental Algorithmics (JEA) 16:1–4.