1 Introduction
Teamwork plays a vital role in educational activities, as students can work together to achieve shared learning goals. By working in groups, students have better communication and become more social and creative. Moreover, they can learn about leadership, higherorder thinking, conflict management [8, 7] etc. A common practice for group work is as follows: The instructor provides a list of topics, projects, or tasks^{1}^{1}1We use the term “topic” to refer to all of them. according to which the different nonoverlapping groups of students should be formed. The grouping procedure can be performed randomly or based on students’ preferences [18] typically expressed as a ranking over the provided topics. Or, the teacher just says: “Find yourself into groups”; this case is not random and not according to preferences with regard to the topic but is triggered by my social networks or is due to the fact of how sitting in the neighborhood. The important case and often in educational settings, is the grouping with regard to preferences. In the classroom, this costs time, and a suitable algorithm could help. Therefore, in this work, we consider the case of grouping w.r.t. students’ preferences.
The grouping process should consider a variety of requirements. First, students’ preferences should be taken into account (i.e., student satisfaction requirement). A grouping is considered satisfactory if it can satisfy the students’ preferences as much as possible. It is important to give students equal opportunity to obtain their preferred topics [16]. Second, the groups should be balanced in terms of their cardinalities, so all students share a similar amount of work (i.e., group cardinality requirement) because when groups have unequal sizes, and the minority group is smaller than a critical size, the minority cohesion widens inequality [21]. Third, the instructor might be interested in fairrepresentation groups w.r.t. some protected attributes like gender or race [11] (i.e., group fairness requirement), because a higher female ratio in the groups may have a positive effect on the groups’ performance [5].
These requirements have been already discussed in the related work but are typically treated independently. For example, fairness has been discussed in the context of group assignments [7], assignment of group members to tasks [18] or students to projects [23]. Student satisfaction requirement is formulated by the sum of the number of topics staffed [15] or the sum of the utilities of the topics assigned to students based on the ranking of preferences chosen by students [16]. The resulting group cardinality can be satisfied by the heuristic method [19]
, or the hierarchical clustering approach
[12]. However, providing a grouping solution that simultaneously satisfies all these constraints is hard. And, “in general, it is not possible to assign all students to their most preferred project” [23].In this work, we introduce the problem of multifair capacitated (MFC) grouping that aims to ensure fairness of the resulting groups in multiple aspects. We target fairness in terms of assignment (maximizing the student satisfaction by the objective function) and fairness w.r.t. protected attributes as well balanced cardinalities across the resulting groups (with a lower bound and an upper bound). We define two fairness constraints of the grouping based on the Nash social welfare notation [20], and the balance score [2]. We propose two approaches to solve the MFC grouping problem. The first is a heuristic, whereas, in the second, we reformulate the assignment step as a maximal knapsack problem.
The rest of our paper is structured as follows: we overview the related work in Section 2. The multifair capacitated grouping problem is introduced in Section 3. Section 4 presents the solutions to the MFC problem. The experimental evaluation on several educational datasets is described in Section 5. Finally, section 6 summarizes our conclusions and outlook.
2 Related work
In the education domain, Miles et al. [18] investigated the problem of assignment of group members to tasks. They examined the viability of four methods to assign students into the groups: random, ability, personal influence, and personal influence with justification. Concerning a diversity of features such as skills, genders, and academic backgrounds, Krass et al. [11] investigated the problem of assigning students to multiple nonoverlapping groups. The problem was solved by an integer programming model to minimize the number of overlaps. In a similar research [4], the authors assign students into groups based on their academic background and gender. However, students’ preferences were not considered.
To consider both efficiency and fairness, Magnanti et al. [16] solve an integer programming formulation with two objectives: maximizing the total utility computed by the rank of student’s preferences (efficiency) and minimizing the number of students assigned to the projects which they do not prefer (fairness). Recently, Rezaeinia et al. [23] introduced a lexicographic approach to prioritize the goals. The efficiency objective is computed based on the utility, similar to [16]; however, the authors adapted the Jain’s index [10] to measure the fairness of the assignment.
Related to our grouping problem is the problem of assigning reviewers to papers [9, 14, 25]. However, in the paperreviewer assignment problem, each reviewer should be assigned several papers, and each paper should be assigned several reviewers [9]. Meanwhile, in the students grouping problem, we attempt to generate nonoverlapping groups of students [11], and each student can be assigned to only one group [23].
The knapsack problem formulation has been used for finding good clustering assignments [12]. However, the minimum capacity of a group (cluster) is not ensured. Recently, Stahl et al. [24] introduced a fair knapsack model to balance the price given by the data provider and the suggested price of the customer. The data vendors propose the data for an ask price, and customers can negotiate a bid price. The data quality is adjusted to satisfy the price bargained by the customer and ensure the final selling price is fair. Next, Fluschnik et al. [6] proposed three concepts of fair knapsack (individually best, diverse and fair knapsack) to solve the problem of choosing a subset of items with the total cost is not greater than a given budget while taking into account the preferences of the voters.
The Nash social welfare (Nash equilibrium) [20] was used as the solution concept for fairness [6], i.e., fairness is ensured by the objective function. The group fairness definition for the knapsack problem was investigated recently by Patel et al. [22]. Fairness is defined by several constraints. In their study, each item is characterized by a category, their goal is to select a subset of items such that the total value of the selected items is maximized, and the total weight does not surpass a given weight while each category is fairly represented. The notion of group fairness is defined based on three criteria (the number of items, the total value of items, and the total weight of items in each category).
In this work, we introduce the MFC grouping problem that ensures fairness in multiple aspects. In particular, we guarantee fairness in terms of topics assignment (by objective function) in parallel with cardinality (lower bound and upper bound on the group cardinality) and fairness w.r.t protected attributes (by prime of constraints). To the best of our knowledge, the proposed problem has not been studied before and, as already discussed, comprises a useful tool to ensure fairness in educational activities.
3 Problem definition
Let is a set of students, and is a set of topics. For an integer we use [n] to denote the set . Each student can choose topics as their preferences. We store the preferences of student in a matrix, namely . In which, each row contains the list of topics that are preferred by student . We use the matrix to record the student’s level of interest in the topics. The preference of topic chosen by student is represented by a number ; the most interested topic of student is expressed by the highest value of . Likewise, each topic can be chosen by several students. A priority matrix containing the value computed based on the time when the students register. The value represents the registration of student on the topic . In which, the first register of topic leads to the highest value of . If the topic is not chosen by student then and .
Let be the aggregate function of matrices and . For each student , we define a welfare value: .
(1) 
In Eq. 1, and are the parameters indicating the weight of each component. Figure 1 illustrates an example of matrices , , and of a dataset with 5 students and 4 topics. Matrix is computed based on the preferences of students. In details, with , where indicates the order of preferences. The matrix is computed by Eq. 1 with and .
The goal of a grouping problem is to distribute students into disjoint groups , where , that maximizes the students’ preferences w.r.t the registration time, formulated by the objective function:
(2) 
In other words, the goal is maximize product of the total welfare obtained from each group . In Eq. 2, a set of indexes of selected topics is defined as , . Variable is the flag of , where if is assigned to the group of topic , otherwise .
Similar to [6]^{2}^{2}2In [6], the Nash equilibrium was defined as (the typical formula is , where is a voter in a set of voters , is an item of the knapsack , and represents the extent to which enjoys . The knapsack is fair if that product is maximized., Eq. 2 is the representation of the Nash social welfare (Nash equilibrium) [20] function therefore, we can call a grouping is satisfactory if it maximizes the product in the objective function . Furthermore, we add one to the sum to avoid the phenomenon that the sum of welfare in a certain group might be zero. The objective function is rewritten as follows:
(3) 
Fairness of grouping in terms of protected attributes: Assume that each student is characterized by a binary protected attribute , e.g., . Let denotes the demographic category to which the student belongs, i.e., male or female. Fairness of a group in terms of the balance score w.r.t. protected attribute [2] is defined as the minimum ratio between two categories.
(4) 
Fairness of a grouping w.r.t the protected attribute is computed as:
(5) 
Capacitated grouping: Being inspired of the capacitated clustering problem [19], we call a grouping is capacitated if the cardinality of each group , i.e., , is between a given lower bound and an upper bound .
We now introduce the multifair capacitated (MFC) grouping problem, which satisfies the capacity constraint and two fairness constraints.
Definition 1
MFC grouping problem
We describe the MFC problem as finding a grouping that distributes a set of students into groups corresponding to topics, and satisfies the following constraints: 1) The assignment is fair, i.e., maximizing the objective function (see Eq. 3); 2) The balance of each group is maximized, i.e., the fairness constraint w.r.t the protected attribute (see Eq. 5); 3) The cardinality of each group is in between the lower bound and the upper bound (the cardinality constraint).
4 Methodology
In this section, we propose two approaches to solve the MFC grouping problem. The former is based on a heuristic approach (Section 4.1) while the latter is the reformulation of a knapsack problem (Section 4.2).
4.1 A heuristic approach
The main idea of our heuristic method is to assign a student to the topic which is the highest favorite one of the student’s preferences. This approach is divided into two main phases, as presented in the Algorithm 1.
In the first step, we maximize the students’ preferences by assigning them to the group with the highest priority on their desires. We consider each student and each preference accordingly (lines 30, 31). If many students choose the same topic, we will assign the current observed student to the topic if that student has the highest value. Moreover, the capacity of groups is also taken into account during the assignment procedure (lines 32, 33).
In the second step, we will adjust the assignment to satisfy the constraints (function GroupAdjustment). If there are any ungrouped students, we will try to assign them to the existing groups (line 2 to line 8). If all groups are full, we will choose the most prevalent topic preferred by the remaining ungrouped students and then assign them to such a topic (line 9 to line 16). The cardinality constraint is satisfied in the next step with some modifications to the groups’ members. We disband groups that have too few students and assign such ungrouped ones to other groups. This procedure is repeated until all groups have the desired capacity (line 18 to line 25).
Computational complexity: The computational time of step 1 of the Algorithm 1 is while the processing stage for students who have not been assigned groups costs . In step 2, the group adjustment phase, the maximum running time is because the algorithm has to deal with every group having cardinality less than . In short, because , we can conclude the computational complexity of the algorithm is , where is the number of students and is the number of topics.
4.2 Knapsackbased approach
In the heuristic approach, we tend to assign students to their most favorite topics. This assignment can be detrimental in satisfying the preferences of other students because some of the remaining students will no longer have a topic to be assigned to even though they also have a high degree of interest in that topic. Therefore, assigning students to their second or third favorite topic could improve student satisfaction overall. Hence, we propose a new approach whereby we will search for the most suitable students for each topic. We will formulate the task of selecting the “best” students for a group of the MFC grouping problem as a maximal knapsack problem [17].
Let is a cardinality array, ; and the indexes of topics will be chosen for the resulting groups. For each topic , , i.e., is the index of the selected knapsack, the goal is to select a subset of students (), such that:
(6)  
where
We formulate Eq. 6 as a maximal knapsack problem. In the knapsack problem, the goal is to find a set of items that maximizes the total value, and the total weight is less than or equal to a given limit. In our case, for each selected topic, we find a set of students that maximizes the total , while the total , i.e., the group cardinality, is in the range of the given bounds.
The pseudocode of our knapsackbased method is described in Algorithm 2, which is a twophase approach. In the first step, for each topic, we find the most suitable candidates among the unassigned students by the result of a vanilla knapsack problem [17]. To solve the maximal knapsack problem (Eq. 6), we use the dynamic programming to get the result which is a group of students having the maximum and the cardinality is in the range (line 13 to line 18). The group adjustment step is similar to Algorithm 1, which performs a finetuning phase in the grouping.
Computational complexity: In the first assignment step, most of the computational time is devoted to the knapsack problem, which costs for each topic. Hence, the first assignment step consumes . Similar to Algorithm 1, the maximum running time the group adjustment step is . Because and , the computational complexity of the algorithm is .
5 Evaluation
In this section, we present our experiments and the performance of our proposed approaches on two educational datasets.
5.1 Datasets
We evaluate our proposed methods on two variations of a real dataset used often in educational data science
[13] and a real data science dataset collected at our institute. An overview of datasets is summarized in Table 1.Dataset  #instances  #attributes  Protected attribute  Balance 

Real data science  24  23  Gender (F: 8, M: 16)  0.5 
Student  Mathematics  395  33  Gender (F: 208, M: 187)  0.899 
Student  Portuguese  649  33  Gender (F: 383; M: 266)  0.695 
Real data science dataset^{3}^{3}3https://github.com/tailequy/tailequy.github.io/tree/main/fairgrouping/data. This dataset is collected in a seminar on data science at our institute. Students have to register 3 desired topics out of 16 topics. The advisor will assign students into groups based on their preferences and the registration time. The data contain demographic information of students (attributes: ID, Name, Gender) with their preferences (attributes: wish1, wish2, wish3, registration time (attribute: Time) and priority matrix W which is represented by 16 attributes T1, …, T16.
UCI Student performance dataset^{4}^{4}4https://archive.ics.uci.edu/ml/datasets/Student+Performance. This dataset consists of demographic, social, schoolrelated attributes and students’ grades in secondary education of two Portuguese schools in 2005  2006 [3] with two subjects: Mathematics and Portuguese. Because there is no given information about topics and preferences of students in the original dataset, we create a semisynthetic dataset by generating the preferences and the topics and merging them into the original version. The number of preferences h and the number of topics m are the main parameters of the data generator. For each student, we randomly generate different favorite topics, which are stored in the columns. Then, for each topic, we list the students who selected that topic and randomly generate (different) priorities for them. This matrix is stored in columns. Therefore, the semisynthetic version will contain new attributes.
5.2 Experimental setups
5.2.1 Parameter selection
Similar to the settings of the real data science dataset, we set the number of wishes for the UCI student performance dataset. Naturally, a group should contain at least 2 students; therefore, the number of topics is chosen to satisfy each group of 2 members assigned to a topic. Hence, we set and as the number of topics for the UCI student dataset  Mathematics and Portuguese subjects, respectively. In addition, we set the parameters and (Eq. 1), i.e., each component has the same weight.
Parameters related to groups’ cardinality. Since the real data science is a very small dataset, our methods are evaluated with the lower bound in the range of . For the UCI student performance dataset, we set , as [26] suggests that the average number of students per group should not exceed 20. The upper bound is set as for all datasets.
5.2.2 Evaluation measures
We report our experimental results w.r.t. fairness in terms of grouping assignment, protected attribute and cardinality with the following measures:
Nash equilibrium. The Nash equilibrium is computed by the Eq. 3. However, the number of groups () is determined during the group assignment process, i.e., is different for the same set , for each method. Hence, we normalize the Nash equilibrium of the final group assignment by the following logarithmic function:
(7) 
Balance. The fairness in terms of the protected attribute (Eq. 5).
Satisfaction level of students’ wishes. It is measured by the ratio of the number of students who are satisfied, i.e., they are assigned to the topic of their preferences, out of the total number of students.
(8) 
5.3 Experimental results
5.3.1 Real data science
As demonstrated in Fig. 2a and Fig. 2b, the grouping results from the knapsackbased approach are better in terms of the Nash equilibrium. There are more students allocated to the groups as their preferences when the group’s size is less than 6 persons. The satisfaction level decreases when the groups’ cardinality increases. This is understandable since students have only a limited number of preferences (3 topics). When the group’s cardinality increases, the desired topics become more diverse, and it is difficult to satisfy most students. In terms of fairness w.r.t. the protected attribute, the heuristic method outperforms the knapsackbased approach when there are at least 5 persons in a group (Fig. 2c). The number of groups and the group’s cardinality are quite consistent for both methods, which are illustrated in Fig. 2d and Fig. 2e.
5.3.2 UCI student performance  Mathematics
The knapsackbased approach outperforms the heuristic method in terms of both Nash equilibrium and satisfaction level in most experiments, as visualized in Fig. 3a, 3c. The satisfaction level tends to decrease with the increase in the number of students per group, which is explained with a similar reason as in the real data science dataset. Regarding the fairness w.r.t. protected attribute, the heuristic tends to achieve a higher balance score for the final group assignment in comparison to the knapsackbased method (Fig. 3b). When groups’ cardinality is low (less than 5), the number of groups generated by the knapsackbased approach is less than the number of groups resulting from the heuristic method (Fig. 3d). This phenomenon can be explained by Knapsack’s tendency to create groups with a more flexible number of students, which is shown in Fig. 3e.
5.3.3 UCI student performance  Portuguese
As described in Fig. 4a and Figure 4c, the knapsackbased method once again demonstrates the ability to create groups that have higher Nash equilibrium and level of satisfaction w.r.t students’ wishes than the heuristic method. In terms of fairness w.r.t gender, a higher balance score is observed in the groups generated by the knapsackbased technique (Fig. 4b). Regarding the cardinality and the number of groups, similarly to results on the UCI student performance  Mathematics dataset, the knapsackbased approach divides students into more diverse groups in terms of cardinality (Fig. 4d, e).
Summary of results: In general, the knapsackbased approach outperforms the heuristic method in terms of the Nash equilibrium, the satisfaction level of students’ preferences and fairness w.r.t. gender. However, in some cases, the knapsackbased approach tends to create fewer groups than the heuristic method, i.e., the groups’ cardinality is higher, which has both advantages and disadvantages. On the one hand, the larger groups can produce more ideas in brainstorming and discussions [1]. On the other hand, the performance of the group may decline with the increase in the group’s size [27].
6 Conclusion and outlooks
In this work, we introduced the MFC grouping problem that ensures fairness in multiple aspects. We aim to ensure fairness 1) in terms of assignment by maximizing student satisfaction and 2) a fairrepresentation of students in each group according to the protected attributes like gender or race. In parallel, we balance the cardinality of the resulting groups with a lower and an upper bound. We implemented two proposed methods for the MFC grouping problem: the heuristic approach that prioritizes the students’ preferences in the assignment, whereas the knapsackbased approach takes into account the students’ preferences and the cardinality of the groups during the assignment step, which is formulated as a maximal knapsack problem. Our experiments show that our methods are effective regarding student satisfaction and fairness w.r.t the protected attribute while maintaining the balance in cardinality with the given bounds. In the future, we plan to extend our approach to more than one protected attribute, such as gender and race, as well as to further investigate the groups’ characteristics w.r.t. students’ abilities, communicative skills, etc., and other definitions with different aspects of fairness in the educational environment.
Acknowledgment
The work of the first author is supported by the Ministry of Science and Culture of Lower Saxony, Germany, within the Ph.D. program “LernMINT: Dataassisted teaching in the MINT subjects”.
References
 [1] (1970) Size, performance, and potential in brainstorming groups.. Journal of applied Psychology 54 (1p1), pp. 51. External Links: Link Cited by: §5.3.3.
 [2] (2017) Fair clustering through fairlets. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5036–5044. Cited by: §1, §3.
 [3] (2008) Using data mining to predict secondary school student performance. EUROSISETI. External Links: Link Cited by: §5.1.
 [4] (2007) Indiana university’s kelley school of business uses integer programming to form equitable, cohesive student teams. Interfaces 37 (3), pp. 265–276. External Links: Document Cited by: §2.
 [5] (2001) Effect of gender composition on group performance. Gender, Work & Organization 8 (2), pp. 205–225. External Links: Document Cited by: §1.

[6]
(2019)
Fair knapsack.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 33, pp. 1941–1948. External Links: Document Cited by: §2, §2, §3, footnote 2.  [7] (2003) How fair are group assignments? A survey of students and faculty and a modest proposal. Journal of Information Technology Education: Research 2 (1), pp. 367–378. Cited by: §1, §1.
 [8] (2006) Benefits and problems with student teams: suggestions for improving team projects. Journal of Education for business 82 (1), pp. 11–19. External Links: Document Cited by: §1.
 [9] (1999) The conference paperreviewer assignment problem. Decision Sciences 30 (3), pp. 865–876. External Links: Document Cited by: §2.
 [10] (1984) A quantitative measure of fairness and discrimination. Eastern Research Laboratory, Digital Equipment Corporation, Hudson, MA 21. Cited by: §2.
 [11] (2006) The university of Toronto’s rotman school of management uses management science to create MBA study groups. Interfaces 36 (2), pp. 126–137. External Links: Document Cited by: §1, §2, §2.
 [12] (2021) Faircapacitated clustering. In Proceedings of The 14th International Conference on Educational Data Mining (EDM21), pp. 407–414. Cited by: §1, §2.

[13]
(2022)
A survey on datasets for fairnessaware machine learning
. WIREs Data Mining and Knowledge Discovery 12 (3). External Links: Document Cited by: §5.1.  [14] (2013) On good and fair paperreviewer assignment. In 2013 IEEE 13th international conference on data mining, pp. 1145–1150. External Links: Document Cited by: §2.
 [15] (2008) Optimization support for senior design project assignments. Interfaces 38 (6), pp. 448–464. External Links: Document Cited by: §1.
 [16] (2018) Allocating students to multidisciplinary capstone projects using discrete optimization. Interfaces 48 (3), pp. 204–216. External Links: Document Cited by: §1, §1, §2.
 [17] (1896) On the partition of numbers. Proceedings of the London Mathematical Society 1 (1), pp. 486–490. Cited by: §4.2, §4.2.
 [18] (1998) The fairness of assigning group members to tasks. Group & Organization Management 23 (1), pp. 71–96. External Links: Document Cited by: §1, §1, §2.
 [19] (1984) Solving capacitated clustering problems. European Journal of Operational Research 18 (3), pp. 339–348. External Links: Document Cited by: §1, §3.
 [20] (1950) The bargaining problem. Econometrica 18 (2), pp. 155–162. External Links: Document Cited by: §1, §2, §3.
 [21] (2022) Group mixing drives inequality in facetoface gatherings. Communications Physics 5 (1), pp. 1–9. External Links: Document Cited by: §1.
 [22] (2021) Group fairness for knapsack problems. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1001–1009. Cited by: §2.
 [23] (2021) Efficiency and fairness criteria in the assignment of students to projects. Annals of Operations Research, pp. 1–19. External Links: Document Cited by: §1, §2, §2.
 [24] (2016) Fair knapsack pricing for data marketplaces. In East European Conference on Advances in Databases and Information Systems, pp. 46–59. External Links: Document Cited by: §2.
 [25] (2021) PeerReview4All: fair and accurate reviewer assignment in peer review. Journal of Machine Learning Research 22 (163), pp. 1–66. External Links: Link Cited by: §2.
 [26] (2017) Associating students and teachers for tutoring in higher education using clustering and data mining. Computer Applications in Engineering Education 25 (5), pp. 823–832. External Links: Document Cited by: §5.2.1.
 [27] (1983) The relationships among group size, member ability, social decision schemes, and performance. Organizational Behavior and Human Performance 32 (2), pp. 145–159. External Links: Document Cited by: §5.3.3.