“Gamification” is a relatively new word in the lexicon of the English language. The Merriam-Webster Dictionary defines it as ‘the process of adding games or gamelike elements to something (such as a task) so as to encourage participation’. In the context of our study, we use the word to refer to the process of adding game elements to a pedagogical setting. It is important to note the difference between gamification and educational or serious games. As pointed out in, the latter refers to fully-fledged games for non entertainment purposes, while gamification merely adds elements of games to an existing process.
Early this millenium,  mooted the idea of using games to “lighten up” medical education. They observed that the participation rates for their lunchtime medical quizzes and debates were higher than those for the regular professional seminars in the hospital. Theirs was a short report, which was published alongside a cartoon. Since then, the concept of gamification in education has been taken up much more seriously. There has been an increasing number of peer-reviewed papers on this topic. The authors in  selected and discussed 15 peer-reviewed papers on the various implementations of gamification in education. It was found that these papers only reported on the game design elements used - little emphasis was given to evaluating the impact on students’ learning. Another review paper, , provides further evidence of this increased interest in gamification. The authors describe an increase in the number of peer reviewed papers containing the keywords “gamification”, “gamif*" ,“gameful” and “motivational affordance” in various databases. They went on to review 24 of those papers.
The effects of gamification have been studied from a few different angles. One approach is to measure the success of gamification through the grades obtained at the end of the semester. However, as suggested in , this might not always be appropriate. Even if we can establish that the games have led to an increased desire to learn, we would have to convince the reader that this desire has a causal effect on grade improvement.
Another approach is to directly correlate the game performance with the specific skills it is meant to train. In such cases, it is important to validate the game tool. For instance, by 2012, there had been 25 peer-reviewed papers on the use of actual video games for training medical and surgical skills. The meta-analysis paper  on this topic focused on the validity of the games to enhance the learning for participants; it found that 15 of the 25 tested the validity of the models.
Yet another approach is to measure the impact of the games on the behavioural and/or psychological outcomes in students. One paper that performs an excellent analysis in this manner is . They administer a questionnaire to the same group of students before and after a course that contains game elements. They found that not only had the games improved students’ skills and mastery of the course content, but they had also led to an improved perception and attitude towards the course subject.
This paper is a follow up to . It contains a quantitative assessment of gamification on the learning values embedded in medical students. The study in the parent paper was centered on a validated survey instrument that was utilised to measure the degree of self-directed learning in students before and after a learning journey. In this paper, we re-analyse the same study, but with additional data and with more sophisticated statistical tools. Here, we apply a PLS path model to the data instead of ANOVA and -tests. In addition, while the original paper addressed the impact of the games on the grades of the students, we focus here on understanding the impact of the games on the learning behaviour of the students. To do so, we capitalise on the clever design of the study, which measures intangible constructs related to learning before and after the gamification. Finally, we underline that the analysis in this paper is much more exploratory - it aids in understanding the students’ point of view, and in hypotheses generation for future studies on gamification.
The remainder of this paper is organised as follows. Section 2 details the study protocol and contains a brief outline of the statistical methods used. Section 3 summarises the data and then describes the output of the statistical models. Finally section 4 contains a discussion of the results. It raises some interesting observations that could be further studied in more focused experiments.
2.1 Study Design
The study was conducted within the School of Medicine at the National University of Singapore (NUS). At NUS, classes are taught through a lecture-tutorial system. Whereas lectures are 1.5 hours long and are delivered to large groups of students, tutorials are typically 45 minutes long, and consist of small groups of students. Four tutorial groups of first year medical students were identified for this study. The four groups of students experienced different levels of game elements in their Anatomy tutorials. All groups were taught by Dr Ang Eng Tat; however, they differed in terms of Anatomy topics that they covered.
Group 0 consisted of 16 students from the first semester of the academic year 2017/2018. Their tutorial classes contained no game elements.
Group 1 consisted of 15 students from the first semester of academic year 2016/2017. Their tutorials contained games that were meant to promote healthy competition among individuals and teams. One example of a team game played was Taboo, where participants had to act out Anatomy terms to their team-mates.
Group 2 consisted of 23 students from the same semester as Group 1. In addition to the games that Group 1 played, this tutorial group also had to complete quiz questions on a mobile phone application, Teach Me Anatomy.
Group 3 consisted of 22 students from the same semester as Groups 1 and 2. In addition to the activities that Group 2 participated in, this tutorial group was also assigned Script Concordance Test (SCT) questions. SCT questions are designed to enhance clinical reasoning rather than rote memory. The SCT questions used in this study were designed by Dr Ng Li Shia, based on the guidelines in  and .
In total there were 76 students in the study. The level of gamification increased monotonically, starting from no games for Group 0 to the most amount of gamification for Group 3. To provide a tangible motivation to participate in the activities, there was a SGD 100 prize for the overall game winner in each of the three groups 1, 2 and 3. Another difference with Group 0 was that the tutorials for these three groups were administered as a flipped classroom. This particular implementation required students to prepare and present an assigned Anatomy topic to their fellow classmates.
All students were recruited for the study only after obtaining their informed consent. There were no penalties for withdrawing from the IRB-approved project (NUS IRB: B-16-205). All students in these four groups were asked to complete a survey at the beginning and the end of the semester. The survey questions, further described in Section 2.2, were exactly the same at both points in time.
2.2 Self-Directed Learning Questionnaire
A self-directed student is defined as one who takes primary responsibility or initiative in the learning experience. Medical practitioners are expected to practice self-directed learning, due to the rapidly changing nature of requisite clinical knowledge. However,  noted that the level of self-directed readiness among medical students is low, and suggested that changes to the medical school curricula may be able to fix this.
The survey utilised in this study was the Personal Responsibility Orientation to Self-Direction in Learning (PRO-SDLS) survey. It is described in full detail in , where the authors demonstrate the validity of the instrument. There were 25 questions in the survey. Each question was tied to one of four learning-related constructs: Motivation (7 questions), Initiative (6), Control (6) and Self-Efficacy (6). The full set of questions can be viewed in Supplementary Materials I.
The type of Motivation considered here is the type that supports self-direction: it arises from identifying with the value of the activity (learning Anatomy) and it arises intrinsically, out of interest or enjoyment in the activity. Initiative assesses how proactive a student is with regard to learning. The Control construct is an indication of how strongly a student feels he can change or influence his environment in order to learn better. Finally, Self-Efficacy relates to how confident a student is in his or her own abilities to do what needs to be done to learn well. This instrument was deemed to be applicable to adult learners by psychometric experts in.
Each question in the survey elicited a Likert scale response from the student. There were 5 possible responses to each question, ranging from Strongly Disagree to Strongly Agree. Depending on the question, the response translated to a score from 1 to 5 for that construct. A larger score indicated a higher value for that learning behaviour.
In our study, the survey was used to assess whether or not the games increased the level of self-directedness in learning. By administering it twice - once before the gaming and once after, we are able to pair up observations for a more powerful model, and then assess if there has been a change in the attitudes of students.
2.3 Path Model Analysis
To analyse the data, we employ a Partial Least Squares Path Model (PLS-PM). This framework allows us to model relationships between blocks of multiple variables, where each block represents a theoretical construct that is directly unobservable. In this section, we provide an overview of the main terms and concepts in path modeling. For further details on path models, the reader is referred to  and .
A path model consists of an inner and an outer model. The inner model represents the relationships between the latent constructs in our experiment. In our data, the inner model theorises that the amount of gamification in a classroom has an impact on the change in Motivation, Initiative, Control and Self-Efficacy of students. In this paper, we focus on the four latent variables to understand the impact of the games. A simple visualisation of the inner model in our analysis can be seen on the left in Figure 2.
Having discussed the inner model, we now turn to the outer model. This portion defines how the latent variables are uncovered. Latent variables cannot be directly observed, but they can be indirectly assessed using instruments such as surveys, or other indicators. These are called measurement variables, and they can relate to latent variables in one of two ways: as a formative measurement or as a reflective one. In the formative case, the measurements define the latent variable values. In our set-up, the tutorial groups are formative since they determine the level of games that each student encounters. On the other hand, the change in score for each question reflects the value of the latent variable. Hence the changes in score that we compute from the PRO-SDLS survey are treated as reflective measurements.
When a PLS-PM is fit to a dataset, the values of the latent variables are estimated as a linear combination of the measured variables. The weights in this linear combination are an important part of the output of a path model. Another important output is the path coefficient on each arrow. This summarises the strength of the surmised relationship between two variables. latent variables and the measured variables. Finally, the goodness of fit of a PLS-PM is assessed using an
value, similar to that in linear regression.
The estimation procedure for path models requires that the measured variables be represented numerically. Our experimental set-up contains four groups that are typically represented as dummy variables. In path model analysis, there are two main techniques of handling group variables. One method is to use a permutation test to assess the importance of the grouping variable. The other is to study the grouping variable as a mediating or moderating effect. These two approaches are described further in and . In our study, we are exploring the effects of the games, which are employed at different levels of intensity in the groups. Hence, instead of dummy variables or as a mediating variable, we represent the groups 0, 1, 2 and 3 with just their integers, reflecting the increase in gamification from group to group.
2.4 Principal Component Analysis
Besides the change in construct levels, another factor of interest is the spread (variability) of responses by students. A small spread would indicate consensus among students regarding the effect of gamification on the constructs measured, while a large spread would mean little or no consensus.
However, it would be hard to analyze this by looking at the individual questions from each construct, as there are too many questions to consider collectively.
One way to overcome this is by performing Principal Component Analysis (PCA) on the questions belonging to a particular construct.
PCA reduces the number of variables to be analyzed by building a smaller number of new variables (known as Principal Components) based on the existing questions. Principal Components aim to collectively describe the total variance of the data.
Each Principal Component is ranked based on the amount of variation it explains. The first Principal Component (PC 1) explains the most variation. In general, the Principal Component (PC ) is the Principal Component which explains the most variation.
The first few Principal Components should ideally explain most of the variation in the dataset. If this is the case, differences or similarities within the data can be observed by studying these principal components. This is a much easier task to accomplish, as there are fewer Principal Components compared to the total number of variables in the original data.
3.1 Exploratory Findings
In order to get a grasp on the main patterns in the data, we begin by subtracting the pre-feedback score for each question from the post-feedback score. With these 25 values for each individual, we estimate a probability mass function for each group within each question. The results are presented in Figure1. Most of the panels display a symmetric tent shape, centred at 0, with little difference between the groups. This indicates that the games had little effect on the students’ responses for these questions.
However, we invite the viewer to take a close look at questions 14, 18 (under Motivation), 6 (under Control), 12, and 22 (under Self-Efficacy). For each of these panels, we can see that the skewness of the distributions changes from left to right as we consider the red, green, blue and then purple lines. This corresponds to increases in construct scores for these questions as we traverse the groups from 0 to 3, i.e. in order ofdecreasing gamification. On the other hand, if we study the panels for Initiative, we might deduce the trend was in the opposite direction.
Since we are interested in whether a student’s score decreased, remained the same or increased, we work out these proportions for each construct and display them in Table 1.
Once again, we observe that the category of no change always has the highest proportion of cases. In addition, the proportion of positive changes appears to decrease as we go down the table (increasing gamification) for Motivation, Control and Self-Efficacy. Within these three, this trend appears strongest for Motivation and weakest for Control. Although it must be noted that there is no corresponding dramatic uptick for the proportion of negative changes for these groups. The Initiative construct once again bucks this trend, with group 3 having the highest proportion of positive changes.
3.2 Path Model Results
The PRO-SDLS survey had already been ratified by the original authors Stockdale and Brockett (2011). They had found that the individual questions for each construct were consistent. In the output for our model too, we find that the individual blocks of questions are unidimensional. In line with general guidelines for path models as outlined in , the Dillon-Goldstein’s rho values are all above 0.7. The goodness-of-fit of the model was 0.17. This indicates poor predictive value of the model, but, as we shall see in the next section, there is value in interpreting the coefficients and output in the context of the study. We obtain comprehensible and reasonable findings in addition to a host of unanswered questions!
When we inspect the output of the path model, we find that only the path coefficient for Self-Efficacy is significantly different from 0: the 95% bootstrap confidence interval for this parameter is (-0.623, -0.218). This indicates that as the level of games included increased, the Self-Efficacy of students significantly decreased. The direction of the relationship is the same for the Control and Motivation constructs. It is only for Initiative that the associated change in level is positive. The inclusion of more games seemed to steer the students towards becoming proactive. However, the coefficients for the latter three paths are not significantly different from 0.
Let us recall that the weights are the coefficients that, when applied to the score changes for individual questions, result in the latent variable values for each construct. The weights for the measured variables in the Motivation block are positive except for one question (14). The scenario is identical to the other two blocks whose path coefficients are also negative: Control and Self-Efficacy. For the Initiative block, the weights for half of the measured variables are positive while the rest are negative. In the next section, we zoom in on the questions with large weights and the questions within each block whose signs are in the minority.
3.3 Principal Component Analysis Results
Principal Component Analysis (PCA) was then carried out on the questions, grouped by the construct they were intended to measure.
Based on Table 2, we note that for all the constructs, Principal Component 1 (PC 1) and Principal Component 2 (PC 2) cumulatively explain at least of the total variance. In other words, this means that the first two Principal Components for each construct explains around half the total variation of the original data. Analyzing both concurrently should give us a good idea of the spread of our data.
For each construct, Principal Component 2 (PC 2),was plotted against Principal Component 1 (PC 1). Following which, a confidence ellipse was then plotted for each of the groups. A confidence ellipse refers to the region where a new observation (which was not used to compute the Principal Components) will fall in with a probability of . Figure 3 shows these plots.
The size of confidence ellipse is a result of the variability of the data. If the data has a low variance, it means that the data is closely clustered and hence, previously unseen observations have a high probability of being close to this cluster. As a result, the confidence ellipse will be smaller.
On the other hand, if the data is spread out (high variance), it would be harder to predict the position of previously unseen observations. Therefore, the confidence ellipse will be bigger to compensate for this uncertainty.
We shall detail our findings based on the PCA plots in the next section.
4.1 Interpreting the Path Model Output
In this section, we discuss the interpretation of the path model coefficients and weights. We summarise the findings in the conclusion section in Section 4.3.
4.1.1 Motivation Block
The path coefficient suggests that Motivation levels decreased as gamification increased. In line with the exploratory plots, the questions with the largest weights are 14 and 18:
14: Most of the work that I do in my courses is personally enjoyable or seems relevant to my reasons for attending university. Increased gamification was associated with stronger levels of disagreement with this statement.
18: The main reason that I do the course activities is to avoid feeling guilty or getting a bad grade. Increased gamification stronger agreement.
We can only hypothesise, but it suggests that when there were too many game elements, the students felt it was a distraction. Taking the phrasing of these two statements into consideration, we are loathe to conclude that Motivation decreased. In fact, we are almost certain that our students continued to work hard, but perhaps it was the case that they found it difficult to believe the game-centric sessions could help them as much as routine or traditional learning.
4.1.2 Initiative Block
Unlike the others’, the path coefficient for this block was estimated to be positive. It meant that on average, increased gamification led to an increase in this construct. In particular, a relatively large negative weight was estimated for the following question:
17: I often collect additional information about interesting topics even after the course has ended. Increased gamification stronger agreement.
Upon matching this output with the estimated mass functions earlier, however, we observe that this “trend” could be due to group 1 alone - their answers for this questions were the most changed. They remaining groups’ had not changed much for this question.
The largest positive weight was attributed to question 25:
25: I always rely on the professor/lecturer to tell me what I need to do in the course to succeed. Increased gamification stronger disagreement.
On the whole, the painted picture suggests that students drove themselves into action when they assessed the intensity of non-serious elements in the classroom.
4.1.3 Control Block
The negative path coefficient for this block was not significant. There was only one question that had a large weight here:
6: I often have a problem motivating myself to learn. Increased gamification stronger agreement.
Again, it could be that the amount of games led to reduced interest in the topic.
4.1.4 Self-Efficacy Block
This was the only block whose path coefficient was significant. Being negative, it implied that as gamification increased, the self-confidence of students fell.
12: I am very convinced I have the ability to take personal control of my learning. Increased gamification stronger disagreement.
22: I am unsure about my ability to independently find needed outside materials for my courses. Increased gamification stronger agreement.
The indication seems to be that students are worried by the game elements; perhaps they feel there is less rigour in the class. Hence they fear that they will be ill-prepared for their job later on.
4.2 Interpreting the Principal Component Analysis Plots
First, we note that for the self-efficacy, control and motivation constructs, the confidence ellipse for Group 0 (without gamification) is the largest. For the initiative construct, the confidence ellipse for Groups 0 and 3 are roughly the same size, and are significantly larger compared to Groups 1 and 2. This indicates that generally, there is a large spread (variance) among responses in Group 0 (and for initiative, Group 3) compared to the other groups.
For the self-efficacy and Initiative construct, Group 2 has the smallest confidence ellipse compared to the other groups. On the other hand, Group 1 has the smallest confidence ellipse for the motivation and control constructs. This means that there is less variability among the difference in pre and post responses for students from these groups.
Overall, we notice that typically, the groups with gamification (Groups 1, 2, 3) have a smaller confidence ellipse compared to the group without gamification (Group 0). In other words, there is less variation among responses from students belonging to the former groups. This is particularly the case for the motivation construct, where the ellipse for Group 0 is more than twice the area for the other groups.
This suggests that there is some form of consensus among groups with gamification, resulting in more similar changes in construct intensity pre and post intervention compared to the group without gamification.
Among the groups with gamification, Group 3 has the biggest confidence ellipse for all the constructs, compared to Groups 1 and 2. This suggests that students’ opinions on the effectiveness of the games employed in Group 3 is more varied compared to Groups 1 and 2.
An alternative explanation for the difference in spread across the groups is that the students formed their own opinions regarding the effectiveness of the games employed collectively as a group. Since this wasn’t a controlled experiment, the students were free to discuss the games with their classmates, and even compare their experiences with students from other cohorts (where different or no games were employed).
For example, if a student felt that the games employed was not effective or to his/her liking, the student may have voiced this out to his/her classmates. As a result, the opinion of the classmate may have changed as a result of this and by word of mouth, this may further spread to other classmates. This would then result in a consensus among all the students about the games.
4.3 Feedback from Gamified Groups
At the end of the semester, each student in groups 1, 2 and 3 was polled to obtain qualitiative feedback on the game-enhanced tutorials that they had participated in. The number of respondents from each of these three groups was 14, 22 and 19 respectively. The full feedback form is available for perusal as Supplemental Material II.
All three groups experienced a flipped classroom, meaning that students were assigned to present topics to their peers during the tutorial sessions. This was a success with the students. The proportion of students who answered yes to the following question was 1, .77 and .97 for the three groups.
Did the ‘flipped classroom’ motivate you to do your own self-directed learning?
One of the free-text questions in the form queried the students as to which part of the tutorial sessions they found most useful for learning. The wordcloud for it, below, indicates that most of them found the flipped classroom aspect most useful. The words presentation, peer/classmates learning and flipped appear repeatedly.
It is interesting to note that, corresponding to an increase in amount of games, there was a decreasing proportion of students who felt that the in-class games should continue. The proportion of such students fell from 1.00 to 0.82 to 0.76, corresponding to groups 1, 2 and 3. The precise wording of the question was:
Should this type of tutorial-games sessions continue in future in place of conventional tutorials?
Perhaps the groups with more gamification found the more serious game elements, namely the mobile review app and the Script Concordance Test, more beneficial than the in-class games that were played. However, based on the numbers alone, the serious game elements were not unanimously acknowledged as useful either. The proportion of group 3 students who found the SCT useful was 0.71, although it was explicitly mentioned by two of the students as the most motivating part of the sessions. The proportion of group 2 and 3 students who found the mobile app useful was only 0.36 (in both cases). Overall, the Final Prize awarded to an individual did not appear to be a motivating factor for the students. In each group, the proportion of students who thought so was .50 or less (.50, .23, and .47 respectively).
Although the path model indicates a drop in Motivation to go with an increase in gamification, one of the free-text questions seems to contradict it.
Overall, did you feel more motivated to do self-directed learning after this series of tutorial-games sessions? How so?
Overall, only 12 out of the 55 respondents included a no or not really in their answer. Slightly more than half (30 out of 55) responded with a yes to the question above.
To summarise, we have found that introducing game elements into tutorials needs to be done with care. Although the models that we have fitted do not have highly significant results, they have provided great insights into the impact of gamification on students. We should bear in mind that we do not claim the above results are reproducible. For instance, they could be esoteric to Asian students, who are typically already very well motivated to study on their own, and to study hard.
One finding was that Self-Efficacy fell significantly as the level of games increased. If students truly are concerned about material being sacrificed for the purpose of gamification, then we need to consider how gamification studies should be introduced in future studies. A study where two groups (one gamified and one not) are compared via their performance on an oral exam could be introduced. Alternatively, video lectures from previous semesters could be released to students to alleviate their concerns about the quantity of material they cover.
Regarding Motivation, the path model suggests that the students did not find the games appealing. Although a slight majority in the free-text question answered that the games motivated them, the feedback on the Final Prize agrees with the path model. This is an undesirable situation, as outlined in . There, the authors point out that the rewards in the gamification should resonate with the value of the user. One possibility might be to include the games as a small component in the overall grade of the course.
The qualitative feedback also revealed the value of the flipped classroom. Students were able to appreciate the value of the presentations and the process of preparing for them.
The final two points are to highlight the design of the study, and the use of a path model to analyse the survey data. The before-after set up allowed for powerful comparisons to be made, and the path model provided an easily interpretable model for discussion.
-  Dicheva D, Dichev C, Agre G, Angelova G. Gamification in education: A systematic mapping study. Journal of Educational Technology & Society. 2015;18(3).
Howarth-Hockey G, Stride P.Can medical education be fun as well as educational? BMJ: British Medical Journal. 2002;325(7378):1453.
-  Nah FFH, Zeng Q, Telaprolu VR, Ayyappa AP, Eschenbrenner B. Gamification of education: a review of literature. In: International conference on hci in business. Springer; 2014. p. 401–409.
-  Hamari J, Koivisto J, Sarsa H. Does gamification work?–a literature review of empirical studies on gamification. In: 2014 47th Hawaii international conference on system sciences (HICSS). IEEE; 2014. p. 3025–3034.
-  Muntasir M, Franka M, Atalla B, Siddiqui S, Mughal U, Hossain IT. The gamification of medical education: a broader perspective. Medical education online. 2015;20.
-  Graafland M, Schraagen JM, Schijven MP. Systematic review of serious games for medical education and surgical skills training. British journal of surgery. 2012;99(10):1322–1330.
-  Beylefeld AA, Struwig MC. A gaming approach to learning medical microbiology: students’ experiences of flow. Medical teacher. 2007;29(9-10):933–940.
-  Ang ET, Min CJ, Gopal V, Li Shia N. Gamifying Anatomy Education. Clinical Anatomy. 2018;.
-  Fournier JP, Demeester A, Charlin B. Script concordance tests: guidelines for construction. BMC medical informatics and decision making. 2008;8(1):18.
-  Lubarsky S, Dory V, Duggan P, Gagnon R, Charlin B. Script concordance testing: From theory to practice: AMEE Guide No. 75. Medical teacher. 2013;35(3):184–193.
-  Tagawa M. Physician self-directed learning and education. The Kaohsiung journal of medical sciences. 2008;24(7):380–385.
-  Stockdale SL, Brockett RG. Development of the PRO-SDLS: A measure of self-direction in learning based on the personal responsibility orientation model. Adult Education Quarterly. 2011;61(2):161–180.
-  Sanchez G. PLS path modeling with R. Berkeley: Trowchez Editions. 2013;.
-  Hair Jr JF, Hult GTM, Ringle C, Sarstedt M. A primer on partial least squares structural equation modeling (PLS-SEM). Sage Publications; 2016.
-  Chin WW. Multi-group analysis with PLS. Frequently asked questions-partial least squares & PLS-graph. 2004;.
-  Dibbern J, Chin W. An introduction to a permutation based procedure for multi-group PLS analysis: results of tests of differences on simulated data and a cross cultural analysis of the sourcing of information system services between Germany and the USA. Handbook of Partial Least Squares: Concepts, Methods and Applications. 2010;p. 171–193.
-  Jolliffe IT. Principal Component Analysis. Springer; 2002.
-  James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. vol. 112. Springer; 2013.
-  R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2018. Available from: https://www.R-project.org.
-  Sanchez G, Trinchera L, Russolillo G. plspm: Tools for Partial Least Squares Path Modeling (PLS-PM); 2017. R package version 0.4.9. Available from: https://CRAN.R-project.org/package=plspm.