1 Introduction
Peer learning(PL) can be defined as obtaining knowledge and skill achievement through learning among status equals RN27. In this method, people of the same social group work as amateur instructors in pairs and help each other to teach and learn from each other RN28. The students who learn via this method have a better understanding of the content of the lesson, have higher motivation, and learn faster RN35. Given the positive research results of peer learning, it would make sense to harness these benefits by designing the classroom activities that lead students to peer learning RN31; whitman1988peer.
However, peer learning is not putting students together and hoping for the best. For instance, it may result in one person doing all the work or may fail to lead the students to engage enough interaction and enhance the task in hand consequently; hence the need for a welldesigned structure RN27. This paper proposes a novel game theoretical mechanism for implementing peer learning.
With the aid of mathematical models, game theory analyzes the method of cooperation or conflict between intelligent and rational decision makers myerson1991game. In this atmosphere, each decision maker tries to increase their payoff through interaction with other decision makers. Today many applications of game theory habe been reported in various fields. For example, we can refer to cases such as auctions, bargaining and collective decisionmaking in economies, and voting in the fields of politics and security as well as privacy in the context of networking. Besides, as Cohen et al. cohen2018student state, game theory can be applied in the area of education to enhance learning results. For instance, it has been applied to investigate the effect of strict or lenient scoring on increased or decreased effort of students or instructors Correa87, to study the impact of the number of students in a class on their success RN2; RN3, to model methods of effortmaking by instructors and students RN4, to model the interactions of instructor and student RN6, to model the participatory or competitor behavior between instructors RN5, to evaluate a student’s cooperative behavior RN7; RN12; su2018individuals, to develop digital gamebased learning for the concept of prisoner’s dilemmamoniaga2017prototype, and to create a competition among students RN16. Our study is different from the mentioned works because in fact, our contribution is applying prisoner’s dilemma (PD)which is a famous instance of game theory to prepare a peer learning environment and motivate the students to actively participate in their stages of learning.
In peer learning, a successful effort by both of the students results in increased learning outcomes, but an unsuccessful effort by both parties results in lower learning outcomes. In this method, the noncontributing students are referred to as free riders, who enjoy the benefits of group activities without participation and responsibility in the group work. This behavior of students in peer learning is the same as that of the player in prisoner’s dilemma. Prisoner’s dilemma shows how it’s possible for two rational people to tend not to cooperate (defect) despite their cooperation possibly leading to a higher payoff for both of them. In PD, the payoff of a participant who puts in more effort will be less than the one who makes less effort. In this case, the person who makes less effort is an example of a free rider. On the other hand, in PD, effort is made to lead participants towards having better and greater cooperation. As in PD, greater effort made by both participants in peer learning results in increased learning improvement, which is the main goal of our suggested mechanism.
In this paper, we first present a novel mechanism based on peer learning and prisoner’s dilemma. Secondly, as in a PD situation where the cooperation of both participants leads to a better result, our proposed mechanism tries to encourage the students to make a greater effort to enhance learning their own achievement. Finally, we demonstrate that the presented mechanism could enhance learning outcomes.
To investigate the effect of our mechanism on learning performance, it has been implemented on four groups of students in different courses. A statistical test has been used to analyze the results of pretest and posttest exams taken before and after mechanism implementation respectively. Since the test contains requirements such as data imputation, we will also explain how to utilize and implement the mechanism and how to analyze the results, in the section devoted to our methodology. The result of its implementation indicates that the proposed mechanism has a positive impact on personal learning outcomes.
The rest of the paper is organized as follows. After introducing peer learning and game theory in section 2, a literature review of the application of game theory concerning learning will be shown in Section 3. In Section 4, the suggested mechanism will be presented. Section 5 covers its method of implementation and data collection as well as the method of analyzing the received data. The result of the analysis, as well as the procurement of the mechanism, will be discussed in Section 6. In the final sections, a discussion and conclusion are drawn, and future works are presented.
1.1 Research Questions
This study particularly attempted to answer the following research questions:

Learning improvement: Does the proposed peer learning mechanism, enriched by game theory, enhance students’ learning outcomes?

Free rider prevention: Is PD_PL able to stop the free rider problem?

Subjective evaluation: How do students evaluate the PD_PL process?
2 Background
Since the design of the suggested mechanism is based on game theory and peer learning, we briefly review these concepts in this section.
2.1 Peer Learning
Peer learning is an educational practice in which students are able to reach their goals by working together RN28. This method of learning causes the students to not only rely on the instructor and the syllabus of the book, but also to discuss every opinion concerning themselves as well as others in the same group. On the other hand, students can easily discuss their opinions with others in the absence of the instructor. Another point is that questions and answers can be analyzed from several points of view other than those of the instructors RN31.
Topping et al. RN30classify the advantages of peer learning from the viewpoints of students, instructors, and the educational system. We will delve into some of them in more details as follows:
Advantages of the peer learning from the students’ point of view:

Higher academic achievement

Improvement in interpersonal relationships

Improvement in individuality and society (For instance, more feeling of selfworth, more positive attitudes to institute and learning)

More optimistic learning atmosphere

Motivational improvement (for example, more pleasurable and better opportunities to socialize with peers)
Advantages of peer learning from the instructors’ point of view:

Instructional development (For instance, raising educational services and observing individual student performance)

Classroom management (For example, reducing unsuitable academic and social behavior, opportunities to teach new appropriate classroom behaviors)

Simple implementation
Advantages of peer learning from the educational system point of view:

A host of strategies for improving student achievement

A means of raising educational reforms

A collection of interpositions to simplify addition, improve general classroom discipline, and avoid academic failure
As the proposed mechanism aims to increase learning through teammates, peer learning considerations should be taken into account. Topping RN29 lists the requirements to be clarified in peer learning. He states that the context, goals, curriculum area, participants, helping technique, contact method, materials, participants and their training technique, mechanism of process monitoring, assessment of students’ tactics, evaluation method, and how to get participants’ feedback should be determined. In the section devoted to our methodology (i.e., Section 5), we will explain how to implement such specifications in our mechanism.
2.2 Game Theory and Prisoner’s Dilemma
Game theory forms a mathematical model of cooperation and conflict between intelligent and rational decision makers myerson1991game. A rational decision maker tries to maximize his payoff against other decision makers. The decision maker in game theory is known as the player. In each game, players work together and in every stage of the game, choose a strategy from their strategy set. The players take their payoff with respect to the payoff matrix.
Nash Equilibrium and Pareto Efficiency (Pareto Optimum) are two fundamental definitions in game theory, that we explain as follows.
Definition 1.
Strategy profile constitutes a Nash Equilibrium (NE) if each player meets the below condition:
(1) 
where, and are strategy of player and strategies of remaining players respectively, and is the payoff function of player .
John Nash has shown that at least one mixed strategy Nash Equilibrium will exist for any finite game RN36. Some points have been mentioned regarding NE:

Since each player selects his/her best response to the other players’ choices, NE can be seen as an outcome of mutual best responses.

The NE definition states that, no player can increase his/her payoff by deviating unilaterally.

Accordingly, no player regrets his/her action when they play in a NE.
Definition 2.
A strategy profile of a game is (weakly) pareto efficient iff there is no other strategy profile that would make all players better off myerson1991game.
In the proposed mechanism, we apply prisoner’s dilemma, which is a famous game in game theory. It models the situation where two people who display rational behavior and know that cooperating together for the same subject would be beneficial to both of them, but prefer not to cooperate. In this game, there are two players (Player 1 and Player 2) and each one has a strategy set {Cooperate, Defect}. Table 1 shows the payoff matrix of this game in whichRN26:
and
The rows of this table show the strategies of the first player and the columns show the strategies of player 2. There are two values in each cell of the payoff matrix. For instance, if the strategy profile (Cooperate, Defect) is chosen  that is, player 1 chooses “Cooperate” strategy and the player 2 chooses “Defect”  then the payoff of the player 1 is “a” and that of the player 2 is “d”.
In this game, if both players choose the strategy “Cooperate”, the payoff of both players are “c”. This payoff is higher than what players gained by choosing strategy “Defect”. While one player chooses “Cooperate” and the other chooses “Defect”, the payoff of the player who chooses “Cooperate” is less than the one who chooses “Defect”. For example, if player 1 chooses to cooperate and player 2 chooses not to cooperate, player 1 obtains payoff “a” and other gains payoff “d”, and as we mentioned, “a” is lower than “d”.
In the prisoner’s dilemma, NE is the strategy profile (Defect, Defect). This is due to the fact that none of the players gain more payoff by changing strategy unilaterally. For example, if the first player chooses the “Cooperate” strategy instead of “Defect”, he receives “a” which is lower than “d”. The same condition will happen to the other player.
In this game, the strategy profile (Cooperate, Cooperate) is a pareto efficiency. That is, the other strategy profile does not contribute any extra payoff to both players. As an example, by choosing a different strategy profile (Cooperate, Defect), player 2 will receive a higher payoffو but the payoff of the first player will be lower.
We will explain the relationship between the suggested mechanism and prisoner’s dilemma more in Section 4.
3 Related Work
By reviewing the related research on using game theory in a learning context, we divide these studies into five groups. Some articles model the interaction between instructor and student Correa87; RN2; RN3; RN4; RN6; RN10; RN11; colman2018persistent. Some other works model the interaction between instructors RN5. Beside these articles, there are also articles that use game theory in evaluating and commentating on students’ cooperation behavior RN7; RN12; RN13; RN14; RN9. Another group of articles use game theory to train cooperative behavior RN15. Finally, RN16; noorani2018game applies game theory to increase competition between students.
In the following, we explain each group in detail.

Using game theory in modeling instructors’ and students’ interactions: Between 1987 to 2003, Hector Correa procured a series of works for using game theory in the area of education. Initially, he was theoretically examining the use of economic theory in education Correa87. Then in RN2; RN3, he investigated the correlation between the number of students in a class and educational achievement. He indicated that as the number of students in a class rises, the rate of success of the students would be reduced. These articles had not been implemented.
In 2003, Correa also modeled the interaction between instructors and students RN6. In this study, instructors and students were divided into capable/incapable and hardworking/lazy groups. The achievement function of students was formed as , in which and denoted the time allocated by a student and an instructor for a lesson respectively. The value of was the time allocated by an instructor to each student individually (as an example, the required time for marking the exam paper of a student). Finally, he applied simulation to calculate the values of to in the NE situation.
Moga et al. RN10 defined the strategy set {Cooperative, NonCooperative} for students, and {Using the classic teaching method, Using interaction model in teaching} for instructors. Then, they determined the payoff matrix and NE in order to choose the best educational policy.
Oltean et al. RN11 involved the financial outcome. They defined the strategy set {Study/ Don’t study} for students, and {Verify/ Don’t Verify} for instructor. In the “Verify”, the instructor assessed the students via very precise testing. In the “Don’t Verify”, the financial outcome of the institute was important, and the instructor had a more lenient approach to evaluating students. Instructors chose their strategy with respect to the value of three parameters (Professional Effort/Risk of Losing Money/Professional Prestige). On the other hand, students choose their strategies in respect to the parameters (Knowledge/Risk of losing the Degree/Professional Effort). Then the quality values (H, h, l, L) for the mentioned parameters were considered, in which . Finally, after defining the NE of each scenario, it was calculated.

Applying game theory in modeling the interaction between instructors: Correa analyzed cooperative and competitive behavior between instructors at an educational institute RN5. He defined as an achievement function of the instructor, in which was the number of students who enrolled in the lesson of instructor , and was the number of students who successfully passed the lesson.
Two instructors could cooperate so that they sought to maximize . In another approach, they could adopt competitive behavior so that finding the NE could solve the problem.

Applying game theory in evaluating and analyzing cooperative behavior: Waddell et al. RN7 used PD to analyze the cooperative or competitive behavior of students when they play PD against a known or unknown person. Hemesath RN12 investigated the effect of nationality, gender and familiarity in PD. Molina and et al. in RN14 studied the relation between the gender of students and their strategy selection in PD. Gray et al. examined the effects of age on cooperationgray2017game. FernndezBerrocal RN13 evaluated the relation between emotional intelligence and making decisions in choosing a cooperative or competitive strategy in iterated PD. It concluded that people with higher emotional intelligence make more effective decisions and they thought about long term gains.
Chiong et al. RN9 used evolutionary game theory to analyze the cooperation behavior of players in group work. In their experiment, active players of nonactive groups were transferred to active groups and nonactive players were transferred to nonactive groups. The result of their experiment showed that the active players remained active and nonactive players remained nonactive.

Applying game theory to train cooperative behavior: Fan RN15 used PD to train cooperative behavior. In this study, participants were grouped in pairs. Each participant was given two cards; one marked with a triangle and one with a circle. Each team member showed a card to their teammate simultaneously. Table 2 indicates the payoff matrix of this game. After some sessions, the experimenter gives a short lecture. This lecture (shown in Figure 1) was designed as a treatment variable by which students were explicitly told that it was a good thing to cooperate. The research showed that the proportion of cooperative individuals increased meaningfully immediately after the lecture.
Table 2: Payoff Matrix of showing cards gameRN15 
Applying game theory to motivate competition: Burguillo RN16 used PD to motivate competition between students in order to enhance their java programming skills. The students should use java to create a network program to run PD. Article noorani2018game proposed a game theoretical approach to stimulate learners to take part in a competition to provide more useful explanations.
3.1 How does PD_PL differentiate from related work?
There are distinctions between and the mentioned studies. First, ’s main goal is to enhance learning using peer learning. Second, it models the interaction between students using prisoner’s dilemma. Thirdly, uses PD’s payoff matrix to determine students’ scoring. Finally, it was implemented and the result was reported; that is, this study has not relied solely on simulation or theoretical proof. Future, applies PD to peer learning, while article RN16 defines PD as a java programming exercise, and does not use a mathematical model of game theory.
4 The Proposed Mechanism
We aim to design a mechanism based on game theory to enhance learning outcomes. As mentioned above, we apply prisoner’s dilemma to the proposed mechanism. In PD, strategy profiles (Defect, Defect) and (Cooperate, Cooperate) are NE and pareto optimum respectively. The favourite situation is motivating players to move from NE to pareto optimum to gain higher payoffs. The same holds true in peer learning in which the effort of both students results in greater learning outcomes. We use the payoff as a reward to motivate students to make allout efforts. Thus, we tell the students that the total payoff they gained will be taken into account for their midterm score. Therefore, the mechanism motivates the students to make more effort to gain higher payoffs.
On the other hand, we run the mechanism during several sessions. Therefore, if a student even faces a free rider, he can negotiate with the teammate to make more effort or can even change the teammate in the next session. Following, we describe the proposed mechanism in detail.
4.1 PD_PL: A mechanism based on prisoner’s dilemma and peer learning
Figure 2 illustrates the stages of PD_PL. In each session where the PD_PL is run, we ask the students to form a group of two students at their sole discretion. The mechanism is run at the end of some sessions after the instructor teaches the lesson. We give a sheet to students and ask them to briefly write about a given concept. This concept arises out of the lesson taught in that session; the writing of which does not require more than 5 to 10 minutes.
We emphasize to the students that the written text must be understood by their teammate and that they should help their teammate to eliminate any probable misunderstanding. After the time determined by the instructor elapses, we ask the students to swap their sheet with their partner. Then, we ask students to study theie teammate’s sheet. Afterward, the students should return the sheet to the owner. We ask students to fill in the requested information at the bottom of the sheet in order to make sure they have studied their partner’s sheet. The requested items include the assessment of their teammate’s sheet, a selfassessment and an assessment of familiarity with the teammate.
To evaluate the effect of PD_PL on learning improvement, we use a pretest before and a posttest after the mechanism’s implementation. Pretest and posttest are at the same degree of difficulty and both are related to the concept that we ask students to write about on the sheet.
We run the mechanism during several sessions. An important point is that the students do not know in which sessions the mechanism is run.
At the end of each PD_PL execution, the score of each student is calculated. This score is based on the student’s and the teammate’s sheets. We will further explain about score calculation.
After each execution of mechanism, we place the scores in Edmodo, that is an educational network. People could register as an instructor, student, or parent and use different facility of this educational network. Some facilities are creating a new course, leaving and responding messages publicly and privately, making quizzes, preparing exams and exercises, observing the students’ activities, and uploading the exercises.
Knowing their own and their teammate’s score, students may decide to change the teammate in the next session of mechanism execution, or make a decision about the amount of information that they write in their sheet.
4.2 Scoring Method
As mentioned above, we use pretest and posttest to evaluate the student knowledge before and after executing the mechanism. Maximum score of pretest, posttest and the sheet in each session of PD_PL implementation is 2.
In PD_PL, Equation (2) is used to calculate the score of each member of a group in which and refere to score of sheet of student and his teammate, respectivly.
(2) 
Table 3 shows some calculated scores concerning teammate sheets.
4.3 Relation between Prisoner’s Dilemma and PD_PL
Payoff is equal to benefit minus cost. As stated, if and are assumed to be the score of sheet of students and respectively, the score of each student is calculated according to Equation (2). So the payoff for each student is benefit (that is calculated by Equation (2)) minus the cost (). Hence, the payoff for student is , which is equal to . In the same way, the payoff of student is equal to
When one student writes more and the other (teammate) writes less, for example, , we have a PD condition:
then and
We conclude , which means the payoff for student is higher than that for student . It means the student (player) who makes the most effort gains a lower payoff than the student (player) who makes the least effort.
We also illustrate this condition with a simple numerical example in Table 4. As an example, if the score of a student is zero and the value of his teammate is two, the score of each one is . Therefore, the payoff for the student is equal to and the payoff for their teammate is .
Comparing Table 4 with Table 1, one can see that PD_PL is in fact a prisoner’s dilemma game. It means that when both students participate and cooperate and the score is , their payoff is , which is more than the situation where none of them makes any effort. In the case that one makes more effort and his sheet score is but his teammate obtains zero, the student’s payoff is and his teammate’s payoff is . That is, the student who makes more effort has a lower payoff than that of their teammate.
As mentioned, we run the mechanism in several sessions. Therefore, even if a higheffort student faces a free rider, he can persuade their teammate to make more effort in the next few sessions. He can even change the teammate to achieve a higher payoff. Hence, PD_PL motivates the students to make more efforts and achieve higher payoffs, resulting in higher learning achievement.
4.4 Peer Learning Requirements in PD_PL
In Section 2.1, several requirements of peer learning were enumerated. Table 5 shows how we meet these requirements in our mechanism.
Context  Mechanism is designed for the educational and learning area 

Objectives  The goal is to improve learning 
Curriculum area  In addition to instructor training, midterm exam and final exam, the PD_PL facilitates learning improvement based on peer learning. 
Participants  The participants are a number of Computer Engineering students in courses “Fundamentals of programming”, “Discrete mathematics”, and “Compiler principles techniques and tools”. 
Helping technique  Students help each other by writing their knowledge on sheets and exchanging their sheets. 
Contact Method  In every mechanism execution session, the students can contact each other. They can contact each other during the semester as well. This raises an opportunity for students to choose new teammate or persuade previous teammate to make more effort in order to obtain a higher payoff in the next iterations of the mechanism. 
Materials  After each run of the mechanism, students need to use internet to observe their payoffs. 
Training  Before the start of the mechanism, students are trained about the method of mechanism and scoring method. 
Process monitoring  The quality of the proposed mechanism is controlled by comparing the scores of pretests and posttests. The instructor monitors the mechanism execution as well. 
Assessment of students tactic  Students assess teammates’ and their own sheets every time the mechanism is executed. 
Evolution method  The effect of the mechanism is controlled by comparing posttest to pretest results. 
Feedback  Every time the mechanism is run, students can provide their opinions about the method and quality of procedure in the educational network. Moreover, at the end, they are asked to fill in a questionnaire about the execution of the mechanism. The result of the questionnaire will be indicated in the result section 
5 Methodology
In this section, we describe how to implement the mechanism and how to analyze the results. Figure 3 shows the complete process of PD_PL implementation.
At first, we asked the instructor to select a number of students randomly. Next, we explained the stages of PD_PL and the scoring method to the selected students. We only told to them that our mechanism and scoring was set according to game theory. The majority of students had no knowledge about game theory, but some students knew about game theory through the movie “Beautiful Mind”.
In the first session, we taught how to register and enter the educational network in order to view the scores after each PD_PL execution. In almost in every session that we ran PD_PL, we described the scoring method. The user manual and the scoring method were also placed in the educational network. After each PD_PL execution, top three scores of students in that session were also placed in the education network.
In each selected session, after the instructor taught all the students  including those chosen and those who were not chosen to participate in mechanism  we asked the selected students to stay in the classroom. Then, we asked the students to choose a teammate. The students were allowed to select a different teammate in different sessions. Then we handed the sheets to students and asked them to write about a requested concept briefly. Figure 4 shows a sample sheet that we gave to students. The parts defined for assessment of the teammate’s sheet, selfassessment, and familiarity with the teammate are observable at the bottom of the sheet.
5.1 Participants
The participants were 142 students of Computer Engineering who enrolled in the courses “Fundamentals of Programming” (two different groups), “Discrete Mathematics”, and “Compiler principles techniques and tools”. We ran the proposed mechanism in the autumn and spring semesters of 2016.
At the analysis stage, we would eliminate 23 students who participated in less than 3 sessions of PD_PL execution. The information of remaining students is shown in Table 6.
For each selected course, we were going to run the mechanism in almost half of the sessions of a semester, so that the mechanism was run in seven different sessions of the semester. As we chose four groups (Including two groups of “Fundamentals of Programming” and one group of each other courses), the mechanism was executed in 28 sessions. Table 7 shows the concepts used during these sessions in relation to each selected course.
5.2 Data Analysis Method
Figure 5 shows the complete process of analysing our collected data. At first, the appropriate software to analyze the data is chosen. Then, as we used “Paired Hotelling’s TSquare” to compare pretest and posttest data and this test is unable to work with missing values, we explain the missing value imputation methods that we have used. Finally, using Paired Hotelling’s TSquare, we investigated whether the proposed mechanism has a positive effect on learning.
A more detailed explanation of our analysis of the data collected is included below.
5.2.1 Selecting R as a tool of data analysis
To analyze the data, we use R programming language, which is known as the most widely used software in data mining and data science
RN37. R is an open source software. We used version 3.3.3 of this software as well as version 1.0.136 of RStudio, which is a graphical interface of R.
5.2.2 Missing Data Imputation
As noted, the mechanism was executed throughout 7 sessions of each course. Some students were absent for some sessions. Therefore we faced some missing values, and to use Paired Hotelling’s TSquare, missing data had to be imputed. So we apply four missing data imputation methods including using mean, using median, K Nearest Neighbour (KNN) and Fuzzy KMean Clustering (FKM). We used
to program each method. For KNN, we used the package offered in article RN33.FKM missing imputation is an efficient method RN22 that clusters data using an FKM algorithm and then fills in the missing data based on degrees of membership. In the FKM, each piece of data belongs to all the clusters with different degrees of membership. Algorithm 1 shows the FKM method in which each data includes “S” fields. In our work, equals to 119, which is the number of students.
As we execute our mechanism throughout the 7 sessions of each course, we have 7 pretests and 7 posttests for each group. Therefore, each student has 14 scores in their posttests and pretests, and so in our mechanism, the value of “S” is 14.
The algorithm initially chooses data randomly out of the total data as K clusters’ center (the second line of the algorithm). These centers are shown as . Then, the distance of each piece of data from these centers is calculated (line 9). In the next step, the degree of membership of each piece of data in the centers of the K clusters is calculated (line 11). The value refers to a that is usually equal to 2 RN23. When a data includes some missing fields, we use RN23 to calculate the distance:
(3)  
Such that  
Afterward, the new center of each cluster is defined using the weighted mean of the data existing in that cluster (line 16). The algorithm is repeated until the distance between two consecutive centers is higher than a defined threshold. When the distance is lower than the threshold, the algorithm will be stopped (line 19).
After clustering, the missing fields of data are imputed according to their degrees of membership. As an example, to impute the missing field of , we can use the following equation.
(4) 
Where, refers to field j of the center of cluster k, and refers to the degree of membership of in the center of cluster .
5.2.3 Using Paired Hotelling’s TSquare to Data Analysis
Paired Hotelling’s TSquare is an extended version of Paired ttest in multivariate situation. The Paired ttest is a statistical test that determines whether the mean difference between two groups of observations is zero. Suppose we are interested in evaluating the effectiveness of a training program. We can measure the performance of a sample of students before and after the training program, and examine the differences using a Paired ttest.
The Paired ttest has two opposite hypotheses; the null hypothesis (
) and the alternative hypothesis (). The assumes that the true mean difference between the paired samples is zero. The twotailed assumes that the difference is not equal to zero.Suppose are pointing to a sample of a population where and refer to before and after observations. and are defined as below:
(5)  
The values of and refer to mean of and . If relation 6 is true, we reject the null hypothesis at signiﬁcant level (with chance of being mistaken) RN18; RN19. The popular value of are , or . In our research, we set to .
(6) 
In relation 6, , where and
is the variance of samples. The value of
denotes the upper th percentile of the tdistribution with degree of free.As denoted, for each group, PD_PL was executed in 7 sessions and on different concepts. If each concept is considered a variable, we face a multivariate problem. Therefore, we should investigate the effect of our mechanism on these different concepts by comparing the scores of pretests with posttests.
In Paired Hotelling’s TSquare, all scalar observations of the Paired ttest are replaced with vectors of observations. When posttests and pretests are measured for the variables (
variables, of which there are 7 in our mechanism), we compute vectors of differences ([posttests][pretests]):(7)  
We calculate the value:
(8) 
Such that:
(9)  
If the relation 10 is true, we reject the null hypothesis at signiﬁcant level .
(10) 
5.2.4 Bonferroni Correction
Any time we reject a null hypothesis, it is possible that we are wrong and the null hypothesis might be really be true, and our significant result might be coincidence. The value of means that there is a chance of getting our observed result if the null hypothesis is true RN20.
The rejection of a true null hypothesis is called a type error. When we test multiple hypotheses in multivariate situations, is the probability of making at least one Type error in multiple hypotheses. The Bonferroni correction is a classic method to solve the problem RN20; RN21 that tests each individual hypothesis at a significance level of where is the chosen overall alpha level and is the number of hypotheses.
In our mechanism, we are going to compare seven pretests to seven posttests. So we use instead of .
6 Results
In this section, we analyze the data of PD_PL implementation and answer to research questions.
6.1 Learning Improvement
To answer research question 1, which was about our mechanism’s effect on learning improvement, we use Paired Hotelling’s TSquare.
Table 8
shows a descriptive statistic of pretests and posttests according to the 28 PD_PL executions. The table shows the number of students, the mean, minimum score, maximum score, and the standard deviation of scores of each session, per course and in total.
As Table 8 illustrates, the average scores of posttests were higher than those of the pretests in all sessions; except for the first session of “Fundamentals of Programming”. This session was the first one in which our mechanism was run, and we asked the students to write about two relatively dense concepts. This exhausted the students, and consequently curtailed their performance. In other sessions, in consultation with the instructor, concepts that took between 5 to 10 minutes to be written about were chosen.
Figure 6 shows the difference between the posttest and the pretest scores of each session per course and in total.
As noted, using Paired Hotelling’s TSquare we examined whether there were a significant difference between the pretest and posttest scores. We used the missing data imputation as a requirement of Paired Hotelling’s TSquare. With the method, as the first centers of clusters are randomly selected, the degree of membership of data (including data that contains missing fields) in the centers may be different. Therefore, we ran the missing data imputation algorithm three times.
At first, we calculated the value of . The value of refers to the number of data that is equal to 119, and refers to number of comparison, which is equal to 7 in our test.
Table 9 shows the value of Hotelling’s TSquare regarding Equation(8) after using different missing data imputation methods.
Method of missing data imputation  

Using Mean  384.009 
Using Median  439.5052 
KNN  365.9887 
FKM (run 1)  352.7527 
FKM (run 2)  353.3958 
FKM (run 3)  352.5303 
As shown in Table 9, the value of Paired Hotelling’s TSquare was higher than value in all situations. Therefore, we can reject and conclude that our mechanism (based on prisoner’s dilemma and peer learning) enhances learning.
In another analysis of the pretests and posttest results, we found that the proposed mechanism could increase learning outcome from to . We subtracted the mean value of posttest scores from the mean value of pretest scores for each course and session. The column “Mean of posttest minus Mean of pretest” in Table 10 shows the results. The positive values show increasing scores in posttests and consequent learning enhancement. Except for the first session of the “Fundamentals of Programming”, the pretest scores were higher than the pretest scores, which meant that PD_PL had a positive impact on learning outcomes. The column “Percentage” of Table 10 expresses the difference between the posttest and pretest scores. The difference between the posttest and pretest scores is expressed as a percentage. By ignoring the first session of “Fundamentals of Programming”, the minimum learning improvement is and the maximum is .
6.2 Free rider prevention
The second research question was about our mechanism’s ability to prevent free riding. Sometimes students may not know that they are doing less than the norm, therefore seeing the scores and what their peers are doing may encourage them to make more effort in future sessions freerider. In PD_PL, students have the opportunity to assess themselves and their peers on how much they do. In addition, at the end of each mechanism run, students could see teammates’ and their own scores in the educational network. On the other hand, the students may really face free riders and have to do all the work. In this case, students may optimistically persuade their cohort to make more effort in subsequent sessions. They can even change their cohort in the next sessions to escape the free rider problem.
Since students may choose different teammates in different sessions it is supposed that after several sessions, they can select a proper partner to collaborate in the learning process and subsequently obtain more payoffs. Given that some students may be absent from some sessions, we did some preliminary research to investigate how many of the groups were formed with the same members at the future sessions. We found in sessions 2 through 7 that and of the groups were formed by previous members for each respective session. For example, of the first session’s teammates were also in a group in session 2. An increase in the percentage groups’ reformation shows that students gradually tended to choose one student as their cohort. Further research needs to be done to demonstrate effective parameters in group formation. We plan to develop this point in the next version of .
6.3 Subjective evaluation
Students’ acceptance and usage are important measures of a mechanism’s success. In early sessions of PD_PL implementation, some students were worried about the amount of work that they had to do. Their initial impressions were that contributing in pretests and posttests and filling in the sheets might be tedious. In later sessions, when they understood the positive effect of PD_PL on their learning improvement, they had more motivation to participate in the mechanism. Interestingly, the teacher stated that during sessions in which the mechanism was not run, the students were following up and willing to run the mechanism.
However in Section 4.2, we stated some requirements of a peer learning environment. The last item mentioned was about giving the students feedback on PD_PL. For this purpose, we prepared a questionnaire and asked students to fill it in. Figure 7 shows the questions and a representation of students’ answers.
According to Figure 7, some conclusions are drawn. For instance, the answers to questions 1 through 5 showed the positive effect of the mechanism on learning enhancement. Questions 6 and 7 investigated the mechanism from viewpoints of competition and cooperation, and responses suggested that the PD_PL tended to increase rates of cooperation rather than competition. The answers to question 8 indicated that the mechanism was attractive for students. Finally, the outcome of question 9 indicated that a facility like the educational network had a positive influence on the mechanism’s performance.
7 Discussion
Implementation of the mechanism faced several challenges. For example, as Table 8 shows, the average scores of posttests were higher than those of the pretests in all sessions except the first session of “Fundamentals of Programming”. This session was the first one during which PD_PL was run, and the instructor asked the students to write about two relatively dense concepts. This tired the students, and consequently reduced their performance. In other sessions, in consultation with the instructor, educational concepts that took between 5 to 10 minutes to be written about were selected. As another example, we can mention the preparation of the implementation requirements. Preparing the pretest and posttest questions, correcting them, inserting the scores into the educational network, and monitoring its implementation and directing the students were a very difficult and time consuming tasks for the researcher and the instructor of the course.
8 Conclusion and future works
In this study, similarities in they situations of peer learning and prisoner’s dilemma was used to propose PD_PL as a game theoretical approach to peer learning.
The proposed mechanism was implemented during several sessions with 142 students. The results of pretest and posttest exams of for all the sessions were compared using R software through Paired Hotelling’s TSquare analysis to investigate the impacts of and the proposed instructional design on students’ personal learning. As Paired Hotelling’s TSquare is not able to function with missing data, we applied four different missing data imputation methods including using mean, using median, K Nearest Neighbour (KNN) and Fuzzy KMean Clustering (FKM). We also used Bonferroni correction to solve the type error in multivariate situations.
The preliminary evaluation indicated that the mechanism had a positive effect on the the learning enhancement. It may be interpreted that PD_PL could propose an acceptable mapping between a prisoner’s dilemma atmosphere and peer learning since in PD_PL, the efforts of teammates resulted in a higher payoff for both of them and consequently increased learning outcomes. Since PD_PL lets students see their teammates efforts, they might escape the free rider problem by changing the teammate or persuading them to be more active. In addition, the results of a subjective evaluation revealed that the majority of students found PD_PL to be an attractive and efficient tool for learning enhancement. The most important findings are:

Learning Improvement: Putting the students together and hopping for the best is not appropriate peer learning implementation. PD_PL prepared a peer learning environment using prisoner’s dilemma. It passed the preliminary verification process and had a positive effect on learning enhancement.

Free rider prevention: At its mostly poorly designed, peer learning may for instance result in one person making all the effort. The ability to see the teammate’s sheet and final score in the educational network and being able to chang peers in different sessions is a way to prevent the free rider problem.

Subjective evaluation: The result of subjective research showed the mechanism to be attractive. It also indicated that PD_PL tended towards cooperation in the learning process.
We encountered some limitations with PD_PL implementation. As mentioned, the mechanism had been run during several sessions and the absence of some students for some sessions made missing data in the pretests and posttests data. However, as described in the methodology section, we minimized the effect of the missing data by using a missing data imputations algorithm. To increase accuracy, we also used four imputation methods in parallel. Another problem we encountered was in cases the number of selected students in a session was odd. To solve the problem, we told the oddone out student, who was alone and did not have a cohort, to participate alone. We ignored this student at the analysis stage.
The proposed mechanism is a general method that can be implemented in any blended environments that possess interaction and scoring ability. The instructional design can also use this mechanism  even in custom classrooms.
More work is yet to be done to determine the teammate changing pattern during different sessions. The research has raised many unanswered questions; for example, whether the teammates’ behavior in previous sessions impacted the amount of knowledge written on the sheet in the next session. There are different methods to choosing strategy in the iterated prisoner’s dilemma RN34. As another interesting point, we aim to investigate students’ behavior during sessions of PD_PL implementation regarding the mentioned methods in reference RN34.
Comments
There are no comments yet.