Matching problems, which generally involve the assignment of a set of agents to another set of agents based on preferences, have wide applications in many real-world settings. One such application can be seen in an educational context, e.g., the allocation of pupils to schools, school-leavers to universities and students to projects. In the context of allocating students to projects, university lecturers propose a range of projects, and each student is required to provide a preference over the available projects that she finds acceptable. Lecturers may also have preferences over the students that find their project acceptable and/or the projects that they offer. There may also be upper bounds on the number of students that can be assigned to a particular project, and the number of students that a given lecturer is willing to supervise. The problem then is to allocate students to projects based on these preferences and capacity constraints – the so-called Student-Project Allocation problem (SPA) [11, 3].
Two major models of SPA exist in the literature: one permits preferences only from the students [2, 6, 10], while the other permits preferences from the students and lecturers [1, 8]. Given the large number of students that are typically involved in such an allocation process, many university departments seek to automate the allocation of students to projects. Examples include the School of Computing Science, University of Glasgow , the Faculty of Science, University of Southern Denmark , the Department of Computing Science, University of York , and elsewhere [2, 3, 6].
In general, we seek a matching, which is a set of agent pairs who find one another acceptable that satisfies the capacities of the agents involved. For matching problems where preferences exist from the two sets of agents involved (e.g., junior doctors and hospitals in the classical Hospitals-Residents problem (HR) , or students and lecturers in the context of SPA), it has been argued that the desired property for a matching one should seek is that of stability . Informally, a stable matching ensures that no acceptable pair of agents who are not matched together would rather be assigned to each other than remain with their current assignees.
Abraham, Irving and Manlove  proposed two linear-time algorithms to find a stable matching in a variant of SPA where students have preferences over projects, whilst lecturers have preferences over students. The stable matching produced by the first algorithm is student-optimal (that is, students have the best possible projects that they could obtain in any stable matching) while the one produced by the second algorithm is lecturer-optimal (that is, lecturers have the best possible students that they could obtain in any stable matching).
Manlove and O’Malley  proposed another variant of SPA where both students and lecturers have preferences over projects, referred to as SPA-P. In their paper, they formulated an appropriate stability definition for SPA-P, and they showed that stable matchings in this context can have different sizes. Moreover, in addition to stability, a very important requirement in practice is to match as many students to projects as possible. Consequently, Manlove and O’Malley  proved that the problem of finding a maximum cardinality stable matching, denoted MAX-SPA-P, is NP-hard. Further, they gave a polynomial-time -approximation algorithm for MAX-SPA-P. Subsequently, Iwama, Miyazaki and Yanagisawa  described an improved approximation algorithm with an upper bound of , which builds on the one described in . In addition, Iwama et al.  showed that MAX-SPA-P is not approximable within , for any , unless P = NP. For the upper bound, they modified Manlove and O’Malley’s algorithm  using Király’s idea  for the approximation algorithm to find a maximum stable matching in a variant of the Stable Marriage problem.
Considering the fact that the existing algorithms for MAX-SPA-P are only guaranteed to produce an approximate solution, we seek another technique to enable MAX-SPA-P to be solved optimally. Integer Programming (IP) is a powerful technique for producing optimal solutions to a range of NP-hard optimisation problems, with the aid of commercial optimisation solvers, e.g., Gurobi , GLPK  and CPLEX . These solvers can allow IP models to be solved in a reasonable amount of time, even with respect to problem instances that occur in practical applications.
In Sect. 3, we describe an IP model to enable MAX-SPA-P to be solved optimally, and present a correctness result. In Sect. 4, we present results arising from an empirical analysis that investigates how the solution produced by the approximation algorithms compares to the optimal solution obtained from our IP model, with respect to the size of the stable matchings constructed, on instances that are both randomly-generated and derived from real datasets. These real datasets are based on actual student preference data and manufactured lecturer preference data from previous runs of student-project allocation processes at the School of Computing Science, University of Glasgow. We also present results showing the time taken by the IP model to solve the problem instances optimally. Our main finding is that the -approximation algorithm finds stable matchings that are very close to having maximum cardinality. The next section gives a formal definition for SPA-P.
2 Definitions and Preliminaries
We give a formal definition for SPA-P as described in the literature . An instance of SPA-P involves a set of students, a set of projects and a set of lecturers. Each lecturer offers a non-empty subset of projects, denoted by . We assume that partitions (that is, each project is offered by one lecturer). Also, each student has an acceptable set of projects . We call a pair an acceptable pair if . Moreover ranks in strict order of preference. Similarly, each lecturer ranks in strict order of preference. Finally, each project and lecturer has a positive capacity denoted by and respectively.
An assignment is a subset of where implies that finds acceptable (that is, ). We define the size of as the number of (student, project) pairs in , denoted . If , we say that is assigned to and is assigned . Furthermore, we denote the project assigned to student in as (if is unassigned in then is undefined). Similarly, we denote the set of students assigned to project in as . For ease of exposition, if is assigned to a project offered by lecturer , we may also say that is assigned to , and is assigned . Thus we denote the set of students assigned to in as .
A project is full, undersubscribed or oversubscribed in if is equal to, less than or greater than , respectively. The corresponding terms apply to each lecturer with respect to . We say that a project is non-empty if .
A matching is an assignment such that for each , for each , and for each (that is, each student is assigned to at most one project, and no project or lecturer is oversubscribed). Given a matching , an acceptable pair is a blocking pair of if the following conditions are satisfied:
either is unassigned in or prefers to , and is undersubscribed, and either
and prefers to , or
and is undersubscribed, or
and prefers to his worst non-empty project,
where is the lecturer who offers .
If such a pair were to occur, it would undermine the integrity of the matching as the student and lecturer involved would rather be assigned together than remain in their current assignment. With respect to the SPA-P instance given in Fig. 1, is clearly a matching. It is obvious that each of students and is matched to her first ranked project in . Although is unassigned in , the lecturer offering (the only project that finds acceptable) is assumed to be indifferent among those students who find acceptable. Also is full in . Thus, we say that admits no blocking pair.
|Student preferences||Lecturer preferences|
Another way in which a matching could be undermined is through a group of students acting together. Given a matching , a coalition is a set of students , for some such that each student () is assigned in and prefers to , where addition is performed modulo . With respect to Fig. 1, the matching admits a coalition , as students and would rather permute their assigned projects in so as to be better off. We note that the number of students assigned to each project and lecturer involved in any such swap remains the same after such a permutation. Moreover, the lecturers involved would have no incentive to prevent the switch from occurring since they are assumed to be indifferent between the students assigned to the projects they are offering. If a matching admits no coalition, we define such matching to be coalition-free.
Given an instance of SPA-P, we define a matching in to be stable if admits no blocking pair and is coalition-free. It turns out that with respect to this definition, stable matchings in can have different sizes. Clearly, each of the matchings and is stable in the SPA-P instance shown in Fig. 1. The varying sizes of the stable matchings produced naturally leads to the problem of finding a maximum cardinality stable matching given an instance of SPA-P, which we denote by MAX-SPA-P. In the next section, we describe our IP model to enable MAX-SPA-P to be solved optimally.
3 An IP model for Max-Spa-P
Let be an instance of SPA-P involving a set of students, a set of projects and a set of lecturers. We construct an IP model of
as follows. Firstly, we create binary variablesfor each acceptable pair such that indicates whether is assigned to in a solution or not. Henceforth, we denote by a solution in the IP model , and we denote by the matching derived from . If under then intuitively is assigned to in , otherwise is not assigned to in . In what follows, we give the constraints to ensure that the assignment obtained from a feasible solution in is a matching.
The feasibility of a matching can be ensured with the following three sets of constraints.
We define , the rank of on ’s preference list, to be where is the number of projects that prefers to . An analogous definition holds for , the rank of on ’s preference list. With respect to an acceptable pair , we define , the set of projects that likes as much as . For a project offered by lecturer , we also define , the set of projects that are worse than on ’s preference list.
In what follows, we fix an arbitrary acceptable pair and we impose constraints to ensure that is not a blocking pair of the matching (that is, is not a type 1(a), type 1(b) or type 1(c) blocking pair of ). Firstly, let be the lecturer who offers .
Blocking Pair Constraints.
We define . Intuitively, if and only if is unassigned in or prefers to . Next we create a binary variable in such that corresponds to the case when is undersubscribed in . We enforce this condition by imposing the following constraint.
where . If is undersubscribed in then the RHS of (4) is at least , and this implies that . Otherwise, is not constrained. Now let . Intuitively, if in then is assigned to a project offered by in , where prefers to . The following constraint ensures that does not form a type 1(a) blocking pair of .
Note that if the sum of the binary variables in the LHS of (5) is less than or equal to , this implies that at least one of the variables, say , is . Thus the pair is not a type 1(a) blocking pair of .
Next we define Clearly, is assigned to a project offered by in if and only if in . Now we create a binary variable in such that in corresponds to the case when is undersubscribed in . We enforce this condition by imposing the following constraint.
where . If is undersubscribed in then the RHS of (6) is at least , and this implies that . Otherwise, is not constrained. The following constraint ensures that does not form a type 1(b) blocking pair of .
We define , the set of projects that likes as much as . Next, we create a binary variable in such that if is full and prefers to his worst non-empty project in . We enforce this by imposing the following constraint.
Finally, to avoid a type 1(c) blocking pair, we impose the following constraint.
Next, we give the constraints to ensure that the matching obtained from a feasible solution in is coalition-free.
First, we introduce some additional notation. Given an instance of SPA-P and a matching in , we define the envy graph , where the vertex set is the set of students in , and the arc set . It is clear that the matching admits a coalition with respect to the instance given in Fig. 1. The resulting envy graph is illustrated below.
Clearly, contains a directed cycle if and only if admits a coalition. Moreover, is acyclic if and only if it admits a topological ordering. Now to ensure that the matching obtained from a feasible solution under is coalition-free, we will enforce to encode the envy graph and impose the condition that it must admit a topological ordering. In what follows, we build on our IP model of .
We create a binary variable for each , , such that the variables will correspond to the adjacency matrix of . For each and () and for each and () such that prefers to , we impose the following constraint.
If and and prefers to , then and we say envies . Otherwise, is not constrained. Next we enforce the condition that must have a topological ordering. To hold the label of each vertex in a topological ordering, we create an integer-valued variable corresponding to each student (and intuitively to each vertex in ). We wish to enforce the constraint that if (that is, ), then (that is, the label of vertex is smaller than the label of vertex ). This is achieved by imposing the following constraint for all and ().
We define a collective notation for each variable involved in as follows.
The objective function given below is a summation of all the binary variables. It seeks to maximize the number of students assigned (that is, the cardinality of the matching).
Finally, we have constructed an IP model of comprising the set of integer-valued variables , the set of constraints (1) - (11) and an objective function (12). Note that can then be used to solve MAX-SPA-P optimally. Given an instance of SPA-P formulated as an IP model using the above transformation, we have the following lemmas.
A feasible solution to corresponds to a stable matching in , where .
Assume firstly that has a feasible solution . Let be the assignment in generated from . Clearly . We note that (1) ensures that each student is assigned in to at most one project. Moreover, (2) and (3) ensures that the capacity of each project and lecturer is not exceeded in . Thus , is a matching. We will prove that (4) - (9) guarantees that admits no blocking pair.
Suppose for a contradiction that there exists some acceptable pair that forms a blocking pair of , where is the lecturer who offers . This implies that is either unassigned in or prefers to . In either of these cases, , and thus . Moreover, as is a blocking pair of , has to be undersubscribed in , and thus . This implies that the RHS of (4) is strictly greater than , and since is a feasible solution to , .
Now suppose is a type 1(a) blocking pair of . This implies for some , where prefers tp . Thus , which implies that the LHS of (5) is strictly greater than . Thus is infeasible, a contradiction.
Next suppose is a type 1(b) blocking pair of . This implies and thus . Also, has to be undersubscribed in which implies that the RHS of (6) is strictly greater than , and thus . Hence the LHS of (7) is strictly greater than , a contradiction, since is a feasible solution.
Next suppose is a type 1(c) blocking pair of . This implies that and thus . Also is full in and prefers to , where is ’s worst non-empty project in . This implies that the RHS of (8) is strictly greater than , and thus . Hence the LHS of (9) is strictly greater than , and thus is infeasible, a contradiction.
Finally, we show that (10) and (11) ensure that is coalition-free. Suppose for a contradiction that admits a coalition , for some . This implies that for each , prefers to , where addition is taken modulo , and hence , by (10). It follows from (11) that , a contradiction. Hence is coalition-free, and thus is a stable matching. ∎
A stable matching in corresponds to a feasible solution to , where .
Let be a stable matching in . First we set all the binary variables involved in to . For all , we set . Now, since is a matching, it is clear that (1) - (3) is satisfied. For any acceptable pair such that is unassigned in or prefers to , we set . For any project that is undersubscribed in , we set and thus (4) is satisfied. For (5) not to be satisfied, its LHS must be strictly greater than . This would only happen if there exists , where is the lecturer who offers , such that , and . This implies that either is assigned in to a project offered by such that prefers to , is undersubscribed in , and prefers to . Thus is a type 1(a) blocking pair of , a contradiction to the stability of . Hence (5) is satisfied.
Now for any lecturer that is undersubscribed in , we set . Thus (6) is satisfied. Suppose (7) is not satisfied. This would only happen if there exists , where is the lecturer who offers , such that , , and . This implies that either is unassigned in or prefers to , , and each of and is undersubscribed. Thus is a type 1(b) blocking pair of , a contradiction to the stability of . Hence (7) is satisfied.
Suppose is a lecturer in and is any project on ’s preference list. Let be ’s worst non-empty project in . If is full in and prefers to , we set . Then (8) is satisfied. Now suppose (9) is not satisfied. This would only happen if there exists , where is the lecturer who offers , such that and . This implies that either is unassigned in or prefers to , , is undersubscribed and prefers to his worst non-empty project in . Thus is a type 1(c) blocking pair of , a contradiction to the stability of . Hence (9) is satisfied.
We denote by the envy graph of . Suppose and are any two distinct students in such that , and prefers to (that is, ), we set . Thus (10) is satisfied. Since is a stable matching, is coalition-free. This implies that is acyclic and has a topological ordering . For each (), let . Now suppose (11) is not satisfied. This implies that there exist vertices and in such that . This is only possible if since and . Hence , a contradiction to the fact that is a topological ordering of (since implies ). Hence , comprising the above assignment of values to the variables in , is a feasible solution to ; and clearly . ∎
Theorem 3.1 ().
A feasible solution to is optimal if and only if the corresponding stable matching in is of maximum cardinality.
Let be an optimal solution to . Then by Lemma 1, corresponds to a stable matching in such that . Suppose is not of maximum cardinality. Then there exists a stable matching in such that . By Lemma 2, corresponds to a feasible solution to such that . This is a contradiction, since is an optimal solution to . Hence is a maximum stable matching in . Similarly, if is a maximum stable matching in then corresponds to an optimal solution to . ∎
4 Empirical Analysis
In this section we present results from an empirical analysis that investigates how the sizes of the stable matchings produced by the approximation algorithms compares to the optimal solution obtained from our IP model, on SPA-P instances that are both randomly-generated and derived from real datasets.
4.1 Experimental Setup
There are clearly several parameters that can be varied, such as the number of students, projects and lecturers; the length of the students’ preference lists; as well as the total capacities of the projects and lecturers. For each range of values for the first two parameters, we generated a set of random SPA-P instances. In each set, we record the average size of a stable matching obtained from running the approximation algorithms and the IP model. Further, we consider the average time taken for the IP model to find an optimal solution.
By design, the approximation algorithms were randomised with respect to the sequence in which students apply to projects, and the choice of students to reject when projects and/or lecturers become full. In the light of this, for each dataset, we also run the approximation algorithms 100 times and record the size of the largest stable matching obtained over these runs. Our experiments therefore involve five algorithms: the optimal IP-based algorithm, the two approximation algorithms run once, and the two approximation algorithms run 100 times.
We performed our experiments on a machine with dual Intel Xeon CPU E5-2640 processors with 64GB of RAM, running Ubuntu 14.04. Each of the approximation algorithms was implemented in Java111https://github.com/sofiat-olaosebikan/spa-p-isco-2018. For our IP model, we carried out the implementation using the Gurobi optimisation solver in Java1. For correctness testing on these implementations, we designed a stability checker which verifies that the matching returned by the approximation algorithms and the IP model does not admit a blocking pair or a coalition.
4.2 Experimental Results
4.2.1 Randomly-generated Datasets.
All the SPA-P instances we randomly generated involved students ( is henceforth referred to as the size of the instance), projects, lecturers and total project capacity which was randomly distributed amongst the projects. The capacity for each lecturer was chosen randomly to lie between the highest capacity of the projects offered by and the sum of the capacities of the projects that offers. In the first experiment, we present results obtained from comparing the performance of the IP model, with and without the coalition constraints in place.
We increased the number of students while maintaining a ratio of projects, lecturers, project capacities and lecturer capacities as described above. For various values of in increments of , we created randomly-generated instances. Each student’s preference list contained a minimum of and a maximum of projects. With respect to each value of , we obtained the average time taken for the IP solver to output a solution, both with and without the coalition constraints being enforced. The results, displayed in Table 1 show that when we removed the coalition constraints, the average time for the IP solver to output a solution is significantly faster than when we enforced the coalition constraints.
In the remaining experiments, we thus remove the constraints that enforce the absence of a coalition in the solution. We are able to do this for the purposes of these experiments because the largest size of a stable matching is equal to the largest size of a matching that potentially admits a coalition but admits no blocking pair222This holds because the number of students assigned to each project and lecturer in the matching remains the same even after the students involved in such coalition permute their assigned projects., and we were primarily concerned with measuring stable matching cardinalities. However the absence of the coalition constraints should be borne in mind when interpreting the IP solver runtime data in what follows.
In the next two experiments, we discuss results obtained from running the five algorithms on randomly-generated datasets.
As in the previous experiment, we maintained the ratio of the number of students to projects, lecturers and total project capacity; as well as the length of the students’ preference lists. For various values of in increments of , we created randomly-generated instances. With respect to each value of , we obtained the average sizes of stable matchings constructed by the five algorithms run over the instances. The result displayed in Fig. 3 (and also in Fig. 4) shows the ratio of the average size of the stable matching produced by the approximation algorithms with respect to the maximum cardinality matching produced by the IP solver.
Figure 3 shows that each of the approximation algorithms produces stable matchings with a much higher cardinality from multiple runs, compared to running them only once. Also, the average time taken for the IP solver to find a maximum cardinality matching increases as the size of the instance increases, with a running time of less than one second for instance size , increasing roughly linearly to seconds for instance size (see Fig. 3).
In this experiment, we varied the length of each student’s preference list while maintaining a fixed number of students, projects, lecturers and total project capacity. For various values of (), we generated instances, each involving students, with each student’s preference list containing exactly projects. The result for all values of is displayed in Fig. 4. Figure 4 shows that as we increase the preference list length, the stable matchings produced by each of the approximation algorithms gets close to having maximum cardinality. It also shows that with a preference list length greater than , the -approximation algorithm produces an optimal solution, even on a single run. Moreover, the average time taken for the IP solver to find a maximum matching increases as the length of the students’ preference lists increases, with a running time of two seconds when each student’s preference list is of length , increasing roughly linearly to seconds when each student’s preference list is of length (see Fig. 4).
4.2.2 Real Datasets.
The real datasets in this paper are based on actual student preference data and manufactured lecturer data from previous runs of student-project allocation processes at the School of Computing Science, University of Glasgow. Table 2 shows the properties of the real datasets, where and denotes the number of students, projects and lecturers respectively; and denotes the length of each student’s preference list. For all these datasets, each project has a capacity of . In the next experiment, we discuss how the lecturer preferences were generated. We also discuss the results obtained from running the five algorithms on the corresponding SPA-P instances.
We derived the lecturer preference data from the real datasets as follows. For each lecturer , and for each project offered by , we obtained the number of students that find acceptable. Next, we generated a strict preference list for by arranging ’s proposed projects in (i) a random manner, (ii) ascending order of , and (iii) descending order of , where (ii) and (iii) are taken over all projects that offers. Table 2 shows the size of stable matchings obtained from the five algorithms, where and denotes the solution obtained from the IP model, 100 runs of -approximation algorithm, single run of -approximation algorithm, 100 runs of -approximation algorithm, and single run of -approximation algorithm respectively. The results are essentially consistent with the findings in the previous experiments, that is, the -approximation algorithm produces stable matchings whose sizes are close to optimal.
|Size of instance|
|Av. time without coalition|
|Av. time with coalition|
|Random||Most popular||Least popular|
4.3 Discussions and Concluding Remarks
The results presented in this section suggest that even as we increase the number of students, projects, lecturers, and the length of the students’ preference lists, each of the approximation algorithms finds stable matchings that are close to having maximum cardinality, outperforming their approximation factor. Perhaps most interesting is the -approximation algorithm, which finds stable matchings that are very close in size to optimal, even on a single run. These results also holds analogously for the instances derived from real datasets.
We remark that when we removed the coalition constraints, we were able to run the IP model on an instance size of , with the solver returning a maximum matching in an average time of seconds, over randomly-generated instances. This shows that the IP model (without enforcing the coalition constraints), can be run on SPA-P instances that appear in practice, to find maximum cardinality matchings that admit no blocking pair. Coalitions should then be eliminated in polynomial time by repeatedly constructing an envy graph, similar to the one described in [11, p.290], finding a directed cycle and letting the students in the cycle swap projects.
-  D.J. Abraham, R.W. Irving, and D.F. Manlove. Two algorithms for the Student-Project allocation problem. Journal of Discrete Algorithms, 5(1):79–91, 2007.
-  A.A. Anwar and A.S. Bahaj. Student project allocation using integer programming. IEEE Transactions on Education, 46(3):359–367, 2003.
-  R. Calvo-Serrano, G. Guillén-Gosálbez, S. Kohn, and A. Masters. Mathematical programming approach for optimally allocating students’ projects to academics in large cohorts. Education for Chemical Engineers, 20:11–21, 2017.
-  M. Chiarandini, R. Fagerberg, and S. Gualandi. Handling preferences in student-project allocation. Annals of Operations Research, to appear, 2018.
-  D. Gale and L.S. Shapley. College admissions and the stability of marriage. American Mathematical Monthly, 69:9–15, 1962.
P.R. Harper, V. de Senna, I.T. Vieira, and A.K. Shahani.
A genetic algorithm for the project assignment problem.Computers and Operations Research, 32:1255–1265, 2005.
-  K. Iwama, S. Miyazaki, and H. Yanagisawa. Improved approximation bounds for the student-project allocation problem with preferences over projects. Journal of Discrete Algorithms, 13:59–66, 2012.
-  D. Kazakov. Co-ordination of student-project allocation. Manuscript, University of York, Department of Computer Science. Available from http://www-users.cs.york.ac.uk/kazakov/papers/proj.pdf (last accessed 8 March 2018), 2001.
-  Z. Király. Better and simpler approximation algorithms for the stable marriage problem. Algorithmica, 60:3–20, 2011.
-  A. Kwanashie, R.W. Irving, D.F. Manlove, and C.T.S. Sng. Profile-based optimal matchings in the Student–Project Allocation problem. In Proceedings of IWOCA ’14: the 25th International Workshop on Combinatorial Algorithms, volume 8986 of Lecture Notes in Computer Science, pages 213–225. Springer, 2015.
-  D.F. Manlove. Algorithmics of Matching Under Preferences. World Scientific, 2013.
-  D.F. Manlove and G. O’Malley. Student project allocation with preferences over projects. Journal of Discrete Algorithms, 6:553–560, 2008.
The evolution of the labor market for medical interns and residents: a case study in game theory.Journal of Political Economy, 92(6):991–1016, 1984.
-  http://www.gurobi.com (Gurobi Optimization website). Accessed 09-01-2018.
-  https://www.gnu.org/software/glpk (GNU Linear Proramming Kit). Accessed 09-01-2018.
-  http://www-03.ibm.com/software/products/en/ibmilogcpleoptistud/ (CPLEX Optimization Studio). Accessed 19-05-2017.