Background. The sm (sm) was first introduced in Gale and Shapley’s seminal paper “College Admission and the Stability of Marriage”. In an instance of sm we have two sets of agents, men and women (of equal number, henceforth ), such that each man ranks every woman in strict preference order, and vice versa. An extension to sm, known as the smi (smi) allows each man (woman) to rank a subset of women (men).
Generalisations of smi in which one or both sets of agents may be multiply assigned have been extensively applied in the real-world. The nrmp (nrmp) is a long standing matching scheme in the US (beginning in 1952) which assigns graduating medical students to hospitals . Other examples include the assignment of children to schools in Boston  and the allocation of high-school students to university places in China .
Gale and Shapley  described linear time algorithms to find a stable matching in an instance of smi. These classical algorithms find either a man-optimal (or woman-optimal) stable matching in which every man (woman) is assigned to their best partner in any stable matching and every woman (man) is assigned to their worst partner in any stable matching. Favouring one set of agents over the other is often undesirable and so we look at the notion of a “fair” matching in which the happiness of both sets of agents is taken into account.
There may be many stable matchings in any given instance of smi, and there are several different criteria that may be used to describe an optimal or “fair” stable matching. The rank of an agent in a stable matching is the position ’s partner on ’s preference list, while the degree of is the highest rank of any agent in . We might wish to limit the number of agents with large rank. A minimum regret stable matching is a stable matching that minimises and can be found in O time . Another type of optimality criteria, uses an arbitrary weight function to find a minimum (maximum) weight stable matching, which is a stable matching that has minimum (maximum) weight among the set of all stable matchings. A special case of this is known as the egalitarian stable matching which minimises the sum of ranks of all agents. Irving et al.  gave an algorithm to find an egalitarian stable matching in O time and discussed how to generalise their method to the minimum (and maximum) weight stable marriage problem. Feder  later improved on this showing that a minimum weight stable matching may be found in O time using weighted sat. A sex-equal stable matching seeks to minimise the difference in the sum of ranks between men and women. Finding a sex-equal stable matching was shown to be NP-hard . A median stable matching, defined formally in Section 2, describes a stable matching in which each agent gains their median partner (if the partners of an agent for all stable matchings were lined up in order of preference) . Computing the set of median stable matchings is #P-hard .
Other notions of fairness involve the profile of a matching which is a vector representing the number of agents assigned to their first, second, third choices etc., in the matching. A rank-maximal stable matching is a stable matching whose profile is lexicographically maximum, ie. maximises the number of agents assigned to their first choice and, subject to that their second choice, and so on. Meanwhile, a generous stable matching is a stable matching whose reverse profile is lexicographically minimum, ie. minimises the number of agents with rank , and subject to that, rank , and so on. Profile-based optimality such as rank-maximality or the generous criteria provide guarantees that do not exist with other optimality criteria giving a distinct advantage to these approaches in certain scenarios.
Irving et al.  describe the use of weights that are exponential in in order to find a rank-maximal stable matching using a maximum weight approach. This requires an additional factor of time complexity to take into account calculations over exponential weights, giving an overall time complexity of O 333Irving et al.  actually state a time complexity of O, however, we believe that this time complexity bound is somewhat pessimistic and that a bound of O applies to this approach.. The choice of max flow algorithm in Irving et al.’s approach is important. Irving et al.  stated that the strongly polynomial O Sleator-Tarjan algorithm  was the best option (at the time of writing). The Sleator-Tarjan algorithm  is an adapted version of Dinic’s algorithm  and finds a maximum flow in a network in O time. Since and [11, pg. 112], this translates to O for the maximum weight stable matching problem and an overall time complexity of O for the rank-maximal stable matching problem. However in 2013 Orlin  described an improved strongly polynomial max flow algorithm with an O (translating to O) time complexity, giving a total overall time complexity for finding a rank-maximal stable matching of O. Feder’s weighted sat approach  has an overall O time complexity for finding a rank-maximal stable matching.
Neither Irving et al.  nor Feder  considered generous stable matchings, however, a generous stable matching may be found in a similar way to a rank-maximal stable matching with the use of weights that are exponential in .
Motivation. For the rank-maximal stable matching problem, Irving et al.  suggest a weight of for each agent assigned to their th choice and a similar approach can be taken to find a generous stable matching as we demonstrate later in this paper. In both the rank-maximal and generous cases, the use of exponential weights introduces the possibility of overflow and accuracy errors upon implementation. This may occur as a consequence of limitations of data types: for example, the int and long primitive types restrict the number of integers that can be represented, and the double primitive type may introduce inaccuracies when the number of significant figures is greater than . Using a weight of for each agent assigned to their th choice as above, it may be that we need to distribute capacities of size across the network . As a theoretical example the long data type has a maximum possible value of . Since , when we are dealing with flows or capacities of order , the largest possible without risking errors is . Alternative data structures such as BigInteger do allow an arbitrary limit (currently the implementation limit of Java’s BigInteger is ), meaning we are more likely to be dependent on the size of computer memory than this bound.
When looking for a rank-maximal or generous stable matching, we describe an alternative approach to finding a maximum flow that does not require exponential weights. This approach is based on using polynomially-bounded weight vectors that involve profiles of matchings rather than exponentially-large scalars used to represent profiles. On the surface, performing operations over polynomially-bounded weight vectors rather than over equivalent exponential weights, would appear not to improve the time or space complexity of the algorithm, since an exponential number would naturally be stored as an equivalent list of integers in memory. However, even for instances of smi with uniformly distributed preference lists, weight vectors of the flow network are not uniformly distributed, allowing us to explore vector compression that is unavailable in the exponential case. Lossless vector compression was performed by saving the index and value of each non-zero vector element. We then calculated the minimum space requirements444Space requirement calculations for both vector-based weights and exponential weights did not assume any particular implementation, but more generally indicated the minimum number of bits required theoretically. to store an array of indices and an array of values for this compressed vector. The degree of a vector is the position of the final non-zero element in . We compressed the exponential weight as far as reasonably possible by finding the maximum degree over all vector-based weights in the instance, before calculating the minimum space requirements to store a number of size . The average number of bits required to store the flow network when using the exponential weight representation of weights was compared to the polynomially-bounded vector-based representation of weights.
Figure 1-1 shows a plot comparing the average number of bits required to store capacities of the flow network using these two approaches for instance size up to . In this plot, circles represent the average number of bits required for different values of . The exact space requirements for vector-based weights for the flow network were found experimentally for randomly generated instances each of size . These instances were also used for experimental work for this paper and more information on their generation can be seen in Section 5. Additionally, instances were tested each for instances of size 555Unlike the instances described in Section 5, calculation of the space requirements for instances of size were carried out on a machine running Ubuntu version with cores, GB RAM and Intel®, Core™ i7-4790 processors, and compiled using Java version 1.7.0. A far larger timeout was given for these instances at hours for each run of the extended Gale-Shapley Algorithm and Minimal Differences Algorithm – see Section 2 for more information on these algorithms. Each instance was run on a single thread, and one instance of size timed out over these instance.. Solid circles represent data points and these were used to calculate the best fit curves shown when assuming a second order polynomial model. confidence intervals for each representation are also displayed.
We can see clearly that the exponential representation requires more space on average than the compressed vector representation and that this difference increases as grows large. Above we extrapolate up until , showing the expected trend with an increasing . Note that the additional small number of data points at fit this model well. We can see that at , we expect the exponential approach to be around times more costly in terms of space than the compressed vector-based approach. These differences naturally will have an effect on the time taken to perform operations over these weights in a Max-Flow algorithm. Combining this with the fact that the time complexity of Irving et al.’s  O algorithm to find a rank-maximal matching is dominated substantially by the maximum flow algorithm (no other part taking more than O), it is arguably important to ensure that the flow network fit comfortably in RAM.
In this paper we present an O algorithm to find a rank-maximal stable matching in an instance of smi using a vector-based weight approach rather than using exponential weights. We also show that a similar process can be used to find a generous stable matching in O time, where is the degree of the matching. In addition to theoretical contributions we also ran experiments using randomly generated sm instances. In these experiments we evaluate differences between egalitarian, sex-equal, median, rank-maximal and generous stable matchings, based on profile, cost and degree measures. In particular, we find that a generous stable matching typically outperforms a rank-maximal stable matching when considering egalitarian and sex-equal cost measures.
Related work. Work undertaken by Cheng et al.  describes the happiness of an agent in a stable matching , defined , as a map from all agents over a given matching to . The map is said to have the independence property if it is only reliant upon information contained in . The hr (hr) is a more general case of smi in which women may be assigned more than one man. Cheng et al.  provide an algorithm for the family of variants of hr incorporating happiness functions that exhibit the independence property, to calculate egalitarian and minimum regret stable matchings. For the case that we are given an instance of smi, this algorithm has a time complexity of O where is the time it takes to calculate the weight of a matching. It is worth noting that the term of this time complexity is due to Irving et al.’s  method of finding a minimum weight stable matching. This method also requires the use of exponential weights which would be problematic for the reasons outlined above.
The ha (ha) is an extension of smi in which women do not have preferences over men. The chat (chat) is an extension of ha in which women may be assigned more than one man and men may be indifferent between one or more women on their preference list. With one-sided preferences the notion of stability does not exist. Rank-maximality however, may be described in an analogous way to smi, and there is an O algorithm to find the rank-maximal matching in an instance of chat , where is the total length of men’s preference lists and is the degree of the matching. We may also seek to find a generous maximum matching in which the most number of men are assigned as possible and then subject to that we use a generous criteria analogous to the smi case. There is an O algorithm to find the generous maximum matching in chat .
Structure of the paper. Section 2 gives a formal definition of smi and various types of optimal stable matchings. Sections 3 and 4 describe the new approach to find a rank-maximal stable matching and a generous stable matching respectively, without the use of exponential weights. Our experimental evaluation is presented in Section 5, whilst future work is discussed in Section 6.
2 Preliminary results and definitions
2.1 Formal definition of smi
The smi (smi) comprises a set of men and a set of women . Each man ranks a subset of women in preference order and vice versa. A man , finds a woman acceptable if appears on ’s preference list and vice versa. A matching in this context is an assignment of men to women such that no man or woman is assigned to more than one person, and if , then finds acceptable and finds acceptable. An example smi instance with men and women is taken from Gusfield and Irving’s book [11, p. 69] and is given as Figure 2-2. A matching is stable if there is no man-woman pair who would rather be assigned to each other than to their assigned partners in (if any). By the “Rural Hospitals” Theorem [22, 21, 9], the same set of men and women are assigned in all stable matchings. We assume that the number of men and women is equal and is denoted .
It is well known that a stable matching in smi can be found in O time via the Gale-Shapley algorithm , where is the total length of all agents preference lists. This algorithm requires either men or women to be the proposers and those of the opposite gender are receivers. However, this procedure naturally produces a proposer-optimal stable matching where members of the proposer group will be assigned to their best possible partner in any stable matching. Unfortunately, this also ensures a receiver-pessimal stable matching in which members of the receiver group will be assigned their worst assignees in any stable matching.
It is natural therefore to want to find some notion of optimality which provides some sense of equality between men and women in a stable matching. This problem has been researched widely and and a summary of the literature is now given.
2.2 Optimality in smi
Let be the rank of woman on man ’s list with an analogous definition for the rank of man on a woman’s list. Then the egalitarian weight function according to men is defined as,
Similarly, the egalitarian weight function according to women is defined as,
Our combined egalitarian weight function is then,
Let be an instance of smi. One measure of optimality is known as the egalitarian stable matching which optimises the total happiness of all men and women over all stable matchings. An egalitarian stable matching is a stable matching such that is minimised taken over the set of stable matchings in . Let define some arbitrary weight function of stable matching . A matching is minimum (maximum) weight if is minimum (maximum) taken over all stable matchings in . Thus the minimum weight function is a generalisation of the egalitarian weight function . Irving et al.  showed that an egalitarian stable matching can be found in O time and a minimum weight stable matching in O time. Additionally, Irving et al.  described a simple transformation that allows the minimum weight stable matching algorithm to be used to find a maximum weight stable matching in the same time complexity. Feder  improved on their method, giving an O algorithm for finding a minimum weight stable matching.
A sex-equal stable matching in is a stable matching such that the difference is minimum. Kato  showed that the problem of finding a sex-equal stable matching is NP-hard.
The degree of a matching is the highest rank of any assigned pair in . Formally, may be defined as follows:
A minimum regret stable matching is then a stable matching in such that is minimised and can be found in O time .
A median stable matching may be described in the following way. Let denote the set of all stable matchings and denote the multiset of all women who are assigned to man in the matchings in (in general is a multiset as may have the same partner in more than one stable matching). Assume that is sorted according to ’s preference order (there may be repeated values) and let represent the th element of this list. Let denote the set of pairs obtained by assigning to for every (). Teo and Sethuraman  showed the surprising result that is a stable matching for every such that . If
is odd then the uniquemedian stable matching is found when . However, if is even, then the set of median stable matchings are the stable matchings such that each man (woman) does no better (worse) than their partner when and no worse (better) than their partner found when . For the purposes of this paper, in particular the experimentation section, we define the median stable matching as the stable matching found when .
Define a rank-maximal matching in smi to be a matching in which the largest number of agents gain their first choice, then subject to that, their second choice and so on. More formally we define a profile as a finite vector of integers (positive or negative) and the profile of a matching as follows. Given a matching , let the profile of be given by the vector where
for some . Thus we define a stable matching in an instance of smi to be rank-maximal if is lexicographically maximum, taken over all stable matchings in . We define the reverse profile to be the vector . A stable matching in an instance of smi is generous if is lexicographically minimum, taken over all stable matchings in .
2.3 Finding a rank-maximal stable matching using exponential weights
Graphical structures. Irving et al.  define a rotation as a list of man-woman pairs in a stable matching , such that when their assignments are permuted (each man moving from to , where is incremented modulo ), we obtain another stable matching. A rotation is exposed in if all of the pairs in are in . Permuting the assignments of an exposed rotation is known as eliminating . A list of rotations of instance is given in Figure 2-3.
In order to describe profiles of rotations we mush first describe arithmetic over profiles. Addition over profiles may be defined in the following way. Let and be profiles of length . Then the addition of to is taken pointwise over elements from . That is, . We define if for . Now suppose . Let be the first point at which these profiles differ, that is, suppose and for . Then we define if . Finally, we say if either or . It is trivial to show that an addition or comparison of two profiles would take O time in the worst case (since the length of any profile is bounded by ). Let be a profile, where . Then for ease of description we may shorten this profile to .
Suppose we have a rotation that, when eliminated, takes us from stable matching to stable matching , where and have profiles and respectively. Then the profile of is defined as the net change in profile between and , that is, . Hence, . It is easy to see that a particular rotation will give the same net change in profile regardless of which stable matching it is eliminated from. For a set of rotations , we define the profile over as .
A rotation poset may be constructed as a directed graph which indicates the order in which rotations may be eliminated. Informally, if one rotation precedes another, , in the rotation poset then is not exposed until has been eliminated. A closed subset of the rotation poset may be defined as a set of rotations such that for every in , all of ’s predecessors are also in . It has been shown that there is a - correspondence between the closed subsets of the rotation poset and the set of all stable matchings [13, Theorem 3.1]. The rotation poset for , denoted , is shown in Figure 3(a).
Irving et al.’s  method for finding a maximum weight stable matching involves finding a maximum weight closed subset of the rotation poset. In order to find this maximum weight closed subset of the rotation poset, other graphical structures need to be defined. A description of the creation of a rotation digraph now follows. First, retain each rotation from the rotation poset as a node. There are two types of predecessor relationships to consider.
Suppose pair . We have a directed edge in our digraph from to if is the unique rotation that moves to . In this case we say that is a type predecessor of .
Let be the rotation that moves below and be the rotation that moves above . Then we add a directed edge from to and say is a type predecessor of .
The rotation digraph for instance , denoted , is shown in Figure 3(b).
Using the rotation digraph structure, Gusfield and Irving  were able to enumerate all stable matchings in O, where is the set of all stable matchings. All stable matchings of instance are listed in Figure 2-5.
We must now convert the rotation digraph to a flow network . First we add two extra nodes; a source node and a sink node . An edge of capacity replaces each original edge in the digraph. Since we are finding a rank-maximal stable matching, capacities on other edges of are calculated by converting each profile of a rotation to a single exponential weight. We decide on a weight function of for each person assigned to their th choice. From this point onwards we refer to the use of this weight function as the high-weight scenario, and denote it as .
Given a profile such that and , define the high-weight function as,
Lemma 2.3 shows that when the above function is used, a matching of maximum weight will be a rank-maximal matching.
Let and be profiles such that and . Let denote the th term of and let denote the sum of terms for all such that . If , then . Additionally, if is the first point at which and differ, then .
Assume . Then must be at least larger than since each profile element is an integer by definition. A value of for will contribute to and so it follows that .
Since decreases as increases and , the maximum weight contribution that can make to is when .
Through the following series of inequalities,
it follows that as required. If is the first point at which and differ then it follows that . ∎
Let be an instance of smi and let be a stable matching in . If is maximum amongst all stable matchings of , where is the profile of , then is a rank-maximal stable matching.
Suppose is maximum amongst all stable matchings of . Now, assume for contradiction that is not rank-maximal. Then, there exists some stable matching in such that lexicographically larger than . Let be the first point at which and differ. Since is lexicographically larger than we know that and by Proposition 2.2 it follows that .
But this contradicts the fact that is maximum over all stable matchings of . Therefore our assumption that is not rank-maximal is false, as required. ∎
We now continue describing Irving et al.’s technique for finding a maximum closed subset of the rotation poset. The rotations are divided into positive and negative nodes as follows. A rotation is positive if and negative if . A directed edge is added from the source to each negative node and is given a capacity equal to . A directed edge is also added between each positive node and with capacity . The high-weight flow network of instance is denoted and is shown in Figure 2-6.
Minimum cut of . In the flow network, we denote the flow over a node or edge as and an - cut as with capacity given by .
By the Max Flow-Min Cut Theorem  we need only find a maximum flow through in order to find a minimum cut in . Irving et al.  used the Sleator-Tarjan algorithm  to find a maximum flow. Several analogous definitions used in the Sleator-Tarjan algorithm are required when we move to our new approach and so are described below.
In order to search for augmenting paths we construct a new network known as the residual graph. Given a flow network and a flow in , the residual graph relative to and , denoted , is defined as follows. The vertex set of is equal to the vertex set of . An edge , known as a forward edge, is added to with capacity if and . Similarly an edge , known as a backwards edge, is added to with capacity if and . Using a breadth-first search in we may find an augmenting path or determine that none exists in O time. Once an augmenting path is found we augment in the following way:
The residual capacity is the minimum of the capacities of the edges in in ;
For each edge , if is a forwards edge, the flow through is increased by , whilst if is a backwards edge, the flow through is decreased by .
Ford and Fulkerson  showed that if no augmenting path in can be found then the flow in is maximum. In Figure 2-6 we show the high-weight flow network with a maximum flow highlighted. There is one minimum cut, . Note that this must be a minimum cut since the the flow over edge is equal to its capacity, and the flow over is limited entirely by the capacity of . For this cut we list every rotation such that . Then a maximum weight closed subset of the rotation poset is given by this set of rotations and their predecessors. has associated closed subset of which is precisely the maximum weight closed subset of . The man-optimal stable matching of is
By eliminating rotations from the man-optimal stable matching, we find the rank-maximal stable matching
The following Theorem summarises the work in this section.
Let be an instance of smi. A rank-maximal stable matching of can be found in O using weights that are exponential in .
An alternative to high-weight values when looking for a rank-maximal stable matching, is to use a new approach, involving polynomially-bounded weight vectors, to find a maximum weight closed subset of rotations. This is the focus of the rest of this paper.
3 Finding a rank-maximal stable matching using polynomially-bounded weight vectors
Following a similar strategy to Irving et al. , we aim to show that we can return a rank-maximal stable matching in O time without the use of exponential weights. The process we to follow is described below.
Calculate man-optimal and woman-optimal stable matchings using the Extended Gale-Shapley Algorithm – O time;
Find all rotations using the minimal differences algorithm – O time;
Build the rotation digraph and flow network – O time;
Find a minimum cut of the flow network in O time without reverting to high weights;
Use this cut to find a maximum profile closed subset of the rotations – O time;
Eliminate the rotations of from the man-optimal matching to find the rank-maximal stable matching.
In the next section we discuss required adaptions to the high-weight procedure.
3.2 Vb-networks and vb-flows
In this section we look at steps in the strategy to find a rank-maximal stable matching without the use of exponential weights (Section 3.1) which either require adaptations or further explanation.
In Step 6 of our strategy we eliminate the rotations of a maximum profile closed subset of the rotation poset from the man-optimal stable matching. We now present Lemma 3.1, an analogue of Corollary 3.6.1 of , which shows that eliminating a maximum profile closed subset of the rotation poset from the man-optimal stable matching results in a rank-maximal stable matching.
Let be an instance of smi and let be the man-optimal stable matching in . A rank-maximal stable matching may be obtained by eliminating a maximum profile closed subset of the rotation poset from .
Let be the rotation poset of . By Gusfield and Irving [11, Theorem 2.5.7], there is a - correspondence between closed subsets of and the stable matchings of . Let be a maximum profile closed subset of the rotation poset and let be the unique corresponding stable matching. Then, . Suppose is not rank-maximal. Then there is a stable matching such that . As above, corresponds to a unique closed subset of the rotation poset. Also . But and so cannot be a maximum profile closed subset of , a contradiction. ∎
Steps 3 and 4 of our strategy are the only places where we are required to check that it is possible to directly substitute an operation involving large weights taking O time with a comparable profile operation taking O time.
The first deviation from Gusfield and Irving’s method (described in Section 1) is in the creation of a vector-based flow network (abbreviated to vb-flow network). For ease of description we denote this new vb-flow network as to distinguish it from the high-weight version . We now define a vb-capacity in which is of similar notation to that of a profile.
In a vb-flow network , the vector-based capacity (vb-capacity) of an edge is a vector , where is the number of men or women in and for .
As before we add a source and sink node to the rotation digraph. We replace each original digraph edge with an edge with vb-capacity ( repeated times). For convenience these edges are marked with ‘’ in the network flow diagram. The definition of a positive and negative rotation is also amended. Let have profile . Let be the first non-zero profile element where . We now define a positive rotation as a rotation where , and a negative rotation is one where . Define the absolute value operation, denoted , as follows. If , then leave all elements unchanged. If , then reverse the sign of all non-zero profile elements. Figure 3-7 shows the profile and absolute profile for each rotation of . Then we add a directed edge to the vb-flow network from to each negative rotation node with a vb-capacity of and a directed edge from each positive rotation node to with a vb-capacity of .