Predicting Student Performance in an Educational Game Using a Hidden Markov Model

04/24/2019 ∙ by Manie Tadayon, et al. ∙ 0

Contributions: Prior studies on education have mostly followed the model of the cross sectional study, namely, examining the pretest and the posttest scores. This paper shows that students' knowledge throughout the intervention can be estimated by time series analysis using a hidden Markov model. Background: Analyzing time series and the interaction between the students and the game data can result in valuable information that cannot be gained by only cross sectional studies of the exams. Research Questions: Can a hidden Markov model be used to analyze the educational games? Can a hidden Markov model be used to make a prediction of the students' performance? Methodology: The study was conducted on (N=854) students who played the Save Patch game. Students were divided into class 1 and class 2. Class 1 students are those who scored lower in the test than class 2 students. The analysis is done by choosing various features of the game as the observations. Findings: The state trajectories can predict the students' performance accurately for both class 1 and class 2.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Educational video games have received much attention in recent years due to their positive impacts on students’ learning and their cognitive skills [1]. However, just because a game has educational content and is engaging does not mean it will be effective [2]. To prove its effectiveness, it needs to be further tested and analyzed. Fortunately every action, time click, and interaction in the game can be recorded. This provides a good opportunity for researchers to design a more sophisticated model and build more intelligent platforms.

Time series prediction has a rich history in domains such as speech processing, the stock market, and weather forecasting. Methods have been developed to perform robust and reliable forecasting using various machine learning and optimization algorithms

[3], [4].

The hidden Markov model (HMM) is a popular method to model the time series data because of its rich mathematical structure and the availability of many practical algorithms for computing model components [5]. Numerous papers such as [5], [6], [7], [8], [9] about the HMM applications in speech, the stock market, and biology have been published; however there is a limited amount of work done in predicting the player’s strategies or actions in a game using the HMM. For example, in [10]

the authors incorporated a two state HMM along with dynamic programming to classify and segment a soccer video game. In


the authors used a five state HMM to analyze the individual differences in game behavior and used the logistic regression for the prediction. They showed that the HMM based prediction using sequential data gives better accuracy than a prediction using the aggregated data. Some work has been done on modeling video games using dynamic Bayesian networks (DBN), such as

[12] and [13]. They focus on semantic analysis of sport video games. Considerable research, e.g. [14], [15], [16], [17], [18], and [19] has been conducted on student modeling and designing intelligent tutoring systems (ITS) using Bayesian and belief networks. In [18] and [19]

the authors used the Bayesian knowledge tracing (BKT) to model and evaluate student performance. BKT is a two state HMM where the probability of forgetting a skill is set to zero. However, to the best of our knowledge this is the first work that analyzes student performance in educational video games using an HMM.

The contribution of this paper is to present a novel approach to predict student performance using a video game as opposed to the exam.

The rest of this paper is organized as follows. Section II reviews the HMM algorithm. Section III describes the game dataset used in this paper. Section IV describes the problem formulation as well as the prediction methods. Section V presents and discusses the results. Section VI concludes the paper and suggests a future work.

Ii HMM Algorithm

In this section, HMM algorithms are briefly reviewed. Both the discrete hidden Markov model (DHMM) as well as the continuous hidden Markov model (CHMM) are discussed. The HMM is the extension of the Markov process in which the observations are a probabilistic function of the states. In an HMM, states are considered as hidden and should be inferred by the sequence of observations.

The HMM is characterized by the following:

N: Number of the hidden states. Although this is unknown since the states are hidden, it usually can be initialized to a reasonable number depending on the problem and the dataset and later can be learned using various statistical analysis tools which will be discussed later.

M: Number of the observation symbols per state.

: State sequence where , T is the length of the sequence, and each .

: sequence of the observation symbols where and each .

A: State transition probability. It defines the probability of going from state i to the state j and is denoted by


B: Observation distribution per each state, which is denoted as follows:


: Initial state distribution that is defined as follows:


: HMM parameters together are usually denoted by the following:


The above equations together can be used to fully define any HMM with discrete observations.

Forward and backward algorithms [5] are used to calculate , the probability of observing a sequence given . If the time series is not labeled and the mapping between the observations and the states is not available, then HMM parameters should be estimated using the Baum-Welch or EM algorithm [20]

. If the observations are continuous (CHMM) as opposed to discrete, the emission probability distribution should be adjusted to account for this change. Continuous observations are modeled by fitting the probability density functions (pdf) to the data. A Gaussian distribution or mixture of Gaussian distributions are typically used for modeling the data.

If the observations for each state can be modeled using a single Gaussian distribution, then equation (2) will be changed to the following:


In equation (5), and are the mean and the covariance matrix of the Gaussian distribution for state i respectively.

If a single distribution is not a reasonable fit to the data, then a mixture of Gaussian distributions can be used to model the observations. In this case, equation (6) can be used to model the observations for each state.


is the mixture coefficient and determines the weight each component has in modeling the data. and are the mean and covariance matrices of each mixture component corresponding to the state i.

Decoding the optimal state sequence given the observation can be done using the Viterbi algorithm [21]. It finds the sequence of the states that best explains the observed data:


Iii Dataset

The dataset used in this paper belongs to the Save Patch (SP) game designed by the National Center for Research on Evaluation, Standards, and Student Testing (CRESST). This game is one out of four fraction games designed to teach the concept of a unit in rational numbers. It is intended to teach the following two concepts: 1- Rational numbers are defined relative to a whole unit; 2- Rational numbers can be added only if they have a common denominator [22].

Along with the four games, a pretest and a posttest were designed to test the students’ understanding of the concepts before and after each game. A set of the questions targeted by each game in the posttest and pretest is carefully identified. This is very beneficial since it permits each game to be analyzed and verified independently of all the other games.

Iv Problem Formulation

In this section, the problem formulation and the prediction algorithm using the HMM are discussed. Prediction begins by dividing the students according to their score in the SP game into two classes: Class 1 are those who score low in the questions targeted by the SP game and Class 2 are those who score high in those questions. Each class is trained separately using the HMM, and the optimal parameters are determined by the model selection algorithms, which will be discussed later in the section. Testing or decoding is done by running the Viterbi algorithm on the observation sequences.

Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are the model selection algorithms that are used to combat overfitting by introducing the penalty terms. AIC is defined by the following formula:


where is the log likelihood function and k is the number of parameters in the model; therefore 2k is the penalty term.

BIC is another well known model selection algorithm that measures the trade off between the model fit and the complexity. The formula for the BIC is given below:


and k are the same parameters as AIC, and is the number of observations. By comparing equations (8) and (9) it appears that BIC has a larger penalizing term. Therefore it penalizes the complex model more than AIC.

The prediction problem using the HMM is solved as follows: Hidden states are defined to be the students’ mastery levels, and the goal is to predict their final mastery level as they go through different levels of the game. This can be formulated as predicting the final mastery level given all the past mastery levels .

The following techniques can be used to perform the prediction.

  1. Naive: This is the most basic method in which the predicted value is simply equal to the last observed value of the time series.

  2. Linear averaging: This means that the final predicted value is the average of all the other mastery levels.


    One extension to this method is to perform averaging over a window of time length p, which means to consider only the most recent p values.


    Another extension of this method is the exponential smoothing. The idea is to perform the linear averaging by choosing larger weights for the most recent values and smaller weights for the distant values. This is described by the following formula:


    Equation(13) can also be written recursively as follows:

  3. Mode: This means that the final mastery level is the mastery level that appears most in the sequence. Mathematically this can be represented as follows:


in the equation (13) and (14) is referred to as the smoothing constant. It is a parameter and is selected based on how important are the past values compared to the more recent values in a given time series. For example Table I shows the single exponential smoothing coefficients for the five most recent values in a time series.

0.05 0.1 0.5 0.9
0.0475 0.09 0.25 0.09
0.0451 0. 0.081 0.125 0.009
0.0429 0.0729 0.0625 0.0009
0.0407 0.06561 0.03125 0.00009
TABLE I: Smoothing Coefficients for various

As Table LABEL:TB1 shows, a smaller value of puts more weight on the more distant values and larger put more weight on the more recent values of a time series.

V Results and Discussions

In this section, the results for the prediction task described in the last section are presented and discussed. The prediction is done by the DHMM by discretizing the observations to a certain number of bins using the domain knowledge or the kmean algorithm [23]

. Since the HMM training is done using expectation maximization (EM), and EM might converge to the local optimum instead of the global optimum, multiple initial conditions are used to test the algorithms and the best model is selected using the AIC or BIC. The prediction is performed under the following cases:

  1. The total number of attempts per level is used as the observations.

  2. The total number of moves per level is used as the observations.

Case 1: Total number of attempts per level is used as the observation.

The observations are discretized (DHMM) to four levels according to the following rules: 1 or 2 attempts per level is label 1; 3 or 4 attempts is label 2; 5,6 or 7 attempts is label 3; and anything above is label 4. According to the Tables LABEL:TB2 and LABEL:TB3 that provide the results for the model selections using the BIC algorithm, training is done by assigning the number of states (Q) to be 3 with the initial condition to be 35 for the class 1 and Q=2 with initial condition=26 for the class 2. The goal is to train two separate HMMs with the above parameters for the class 1 and the class 2 and make the prediction of the final mastery level and compare it to the class label.

For instance consider the following examples in Tables LABEL:TB4, V and VI.

Class 1: Table LABEL:TB4 shows an example of a game trajectory for a student who finishes four levels of the game and obtained 1.5 out of 8 in the posttest. For class 1 students, 1 in the “State Sequence” column indicates the lowest mastery level and 3 indicates the highest mastery level. Since there are 3 states, score in the [2,3] is mapped to label 2 and scores in the [1,2) is mapped to label 1. The following cases illustrate how final mastery level is calculated using the methods discussed in the last section.

  1. Naive: . This method ignores all the past states and makes the predicted value to be the most recent state.

  2. Average: . This method assigns equal weight to each state and could be the best prediction method for the game since levels of the game are independent of each other and should be treated separately.

  3. Mode: . This method predicts the final value to be the state that is repeated the most.

Table V shows another example of comparison between the posttest and the state trajectories for the class 1 students. The final prediction for this student is done as follows:

  1. Naive: . This method predicts the class label correctly but it ignores all the past states and does not take any past performance into the account.

  2. Average: . Since is less than 2 therefore the predicted label would be 1.

  3. Mode: . The predicted label using this method is also 1, since 1 is repeated more than any other states.

Row Number Initial Condition BIC Number of States
1 21885 2
26 21720 3
35 21636 3
55 21709 3
64 21779 4
100 21880 2
TABLE II: BIC for Class 1 in Case I
Row Number Initial Condition BIC Number of States
1 17716 2
26 17538 2
35 17541 4
55 17557 3
64 17631 4
100 17726 2
TABLE III: BIC for Class 2 in Case I
ID Posttest State Sequence
TABLE IV: HMM Trajectory for class 1 student in Case I
ID Posttest State Sequence
TABLE V: HMM Trajectory for class 1 student in Case I

Class 2: Table VI shows an example of a student with ID 1627 in class 2 who scored 6.17 out of 8 in the posttest and completed 49 levels of the SP game. Scores in the [1.5,2] are mapped to label 2 and scores in [1,1.5) are mapped to label 1. The final prediction for this student is as follows:

  1. Naive: . Since the predicted label is 1. This is an example of forecasting error since only the last state is used for the prediction.

  2. Average: . Since is greater than 1.5 the predicted label would be 2.

  3. Mode: . Since the predicted label is 2.

ID Posttest State Sequence
TABLE VI: HMM Trajectory for class 2 student in Case I

Table LABEL:TB7 summarizes the accuracy for the various methods discussed in the last section by comparing the class label to the predicted value from the state trajectory. According to Table LABEL:TB7 the best prediction accuracy for class 1 is for the naive method and the best prediction accuracy for class 2 is for the average and the mode methods. Among all the prediction methods described in the last section the average method is the most reliable one; this is because the naive method only accounts for the most recent mastery level and ignores all the past values. This cannot be a reliable method for the prediction using a game since different levels of the game have different game mechanics and difficulties. Therefore, every level should make a contribution to the final prediction. The average method is also more informative and provides more detail than the mode method. To better understand this consider the following cases for two different students who each finish five levels of the game:

  • Student A trajectory is .

  • Student B trajectory is

For both students A and B the predicted label is 2 using both the mode and the average methods. However for the student A the score using the average method is 1.6 while for student B the score using the average method is 2. This shows that student B has a better performance that student A in the game. This information cannot be gained using the mode method since in both cases the state 2 is repeated the most.

Method Class I Accuracy Class II Accuracy
97.48% 86.09%
86.55% 100%
86.55% 100%
TABLE VII: Prediction Accuracies for various Methods in Case I

Case II: Total number of moves per level is used as the observations.

The observations are discretized to four levels according to the following rule: An expert plays all the levels of the game and the total number of moves to finish each level is recorded, then there is an which can be called as a compensation factor which compensates for the game mechanics and the game difficulties. The compensation factor is multiplied by total moves per level for the expert and the observations are discretized according to this rule per level since different levels might have different difficulties. According to the Tables LABEL:TB8 and LABEL:TB9 the training is done by letting the number of states to be 3 for both class 1 and 2 and the initial conditions to be 35 for the class 1 and 26 for the class 2. Similar analysis to case I is done here to predict the final mastery level.

For instance consider the following examples in Tables LABEL:TB10 and LABEL:TB11.

Class 1: Table LABEL:TB10 shows an example of a class 1 student with ID 1768 who scored 3.88 out 8 in the posttest. The predicted mastery level using all three methods is 1. This is because 1 is repeated more than other states, the most recent state is 1 and the average value of the state trajectory is 1.85 which is less than 2. Although all three methods predict the final mastery level correctly, the average method is more informative since it provides more detail that the given student has done well on some levels since his score is close to the boundary between the class 1 and the class 2.

Class 2: Table LABEL:TB11 shows another example where the naive method can make an incorrect prediction. The mode or average methods predict the final mastery level to be 2 while the naive method predicts the final value to be 1 since the most recent state is 1.

Row Number Initial Condition BIC Number of State
1 25895 2
26 25850 7
35 25596 3
55 25613 3
64 25682 4
100 25872 2
TABLE VIII: BIC for Class 1 in Case II
Row Number Initial Condition BIC Number of State
1 24462 2
26 24206 3
35 24239 4
55 24231 3
64 24315 4
100 24456 5
TABLE IX: BIC for Class 2 in case II
ID Posttest State Sequence
TABLE X: HMM Trajectory for class 1 student in Case II
ID Posttest State Sequence
TABLE XI: HMM Trajectory for class 2 student in Case II
Method Class I Accuracy Class II Accuracy
89.92% 92.17%
75.63% 100%
76.47% 100%
TABLE XII: Prediction Accuracies for Various Methods in Case II

Table LABEL:TB12 presents the prediction accuracies for various algorithms for the class 1 and the class 2 students. According to Table LABEL:TB12 the highest prediction accuracy for class 1 is for the naive method and for class 2 is for the average and mode methods. Similar to case I, it can be argued that the average method is better than naive and is more informative than the mode method.

Case I Case II
93.16% 87.61%
Class 1: 86.55% Class 1: 75.63%
Class 2: 100.0% Class 2: 100.0%
Class 1: 100.0% Class 1: 100.0%
Class 2: 87.79% Class 2: 79.86%
Class 1: 92.79% Class 1: 86.12%
Class 2: 93.50% Class 2: 88.80%
0.8644 0.8818
TABLE XIII: Statistical Results for Average Method

Other metrics that are widely used in evaluating a model are recall, precision, accuracy, and AUC scores. Table XIII summarizes the results for the average method for both class 1 and class 2 students under both case I and case II. High values of accuracy, recall, precision, and AUC scores under both case I and II suggest that the proposed method can perform strong prediction of student mastery levels.

One important topic that needs more attention is the confounding variables. They are defined as the variables that affect both the independent and dependent variables and if not controlled properly they might change the results of experiments. For example, transfer of knowledge between the game and the posttest is a confounding variable. Transfer of knowledge is the application of the previously learned skills in a new domain. This can be the main reason why accuracy for class 1 is lower than class 2 students since class 1 students could have a harder time connecting the game concepts to the posttest. Game mechanics and the difficulty of the different levels are other confounding variables in the SP game. However there might be other confounding variables which are hard to control and might cause the change in the exam score such as students’ interest, family situation, health and many more. Since in this study a retrospective analysis of the data was conducted, it was not possible to query such factors.

Vi conclusion

In this paper, the HMM algorithm is used to predict the students’ final mastery level given their performance in various levels of the game. It was shown that despite various confounding variables affecting the students the HMM can be used as a promising solution in educational environments to model students’ actions and make the prediction throughout the game.

The results indicate that examining time series data from the game can lead to dynamic evaluation of student mastery levels throughout the time which cannot be obtained by examining only the posttest. This can be useful to make a timely intervention and provide efficient feedback. In particular, such dynamic analysis can enable students to be guided through a sequence of concepts that build on each other, thus enabling learning at their own pace. It can also be used to target human intervention for students who are struggling.

While for this study there was no ground truth to quantify student attainment throughout the game, the strong prediction of final mastery level shows that there is considerable promise in applying the HMM to this purpose. A focus of the future research will be on how to design the interactive game experience to enable such inferences to be of high quality.


  • [1] T.-Y. Chuang and W.-F. Chen, “Effect of computer-based video games on children: An experimental study,” in 2007 First IEEE International Workshop on Digital Game and Intelligent Toy Enhanced Learning (DIGITEL’07).   IEEE, 2007, pp. 114–118.
  • [2] S. M. Fisch, “Making educational computer games educational,” in Proceedings of the 2005 conference on Interaction design and children.   ACM, 2005, pp. 56–61.
  • [3] J. G. De Gooijer and R. J. Hyndman, “25 years of time series forecasting,” International journal of forecasting, vol. 22, no. 3, pp. 443–473, 2006.
  • [4] P. J. Brockwell, R. A. Davis, and M. V. Calder, Introduction to time series and forecasting.   Springer, 2002, vol. 2.
  • [5] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
  • [6] A. Varga and R. Moore, “Hidden markov model decomposition of speech and noise,” in Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on.   IEEE, 1990, pp. 845–848.
  • [7] B. Schuller, G. Rigoll, and M. Lang, “Hidden markov model-based speech emotion recognition,” in Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International Conference on, vol. 2.   IEEE, 2003, pp. II–1.
  • [8] M. R. Hassan and B. Nath, “Stock market forecasting using hidden markov model: a new approach,” in Intelligent Systems Design and Applications, 2005. ISDA’05. Proceedings. 5th International Conference on.   IEEE, 2005, pp. 192–196.
  • [9] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids.   Cambridge university press, 1998.
  • [10] L. Xie, S.-F. Chang, A. Divakaran, and H. Sun, “Structure analysis of soccer video with hidden markov models,” in Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, vol. 4.   IEEE, 2002, pp. IV–4096.
  • [11] S. Bunian, A. Canossa, R. Colvin, and M. S. El-Nasr, “Modeling individual differences in game behavior using hmm,” arXiv preprint arXiv:1804.00245, 2018.
  • [12] C.-L. Huang, H.-C. Shih, and C.-Y. Chao, “Semantic analysis of soccer video using dynamic bayesian network,” IEEE Transactions on Multimedia, vol. 8, no. 4, pp. 749–760, 2006.
  • [13] F. Wang, Y.-F. Ma, H.-J. Zhang, and J.-T. Li, “A generic framework for semantic sports video analysis using dynamic bayesian networks,” in Multimedia Modelling Conference, 2005. MMM 2005. Proceedings of the 11th International.   IEEE, 2005, pp. 115–122.
  • [14] C. Carmona, G. Castillo, and E. Millán, “Designing a dynamic bayesian network for modeling students’ learning styles,” in 2008 Eighth IEEE International Conference on Advanced Learning Technologies.   IEEE, 2008, pp. 346–350.
  • [15] H. Gamboa and A. Fred, “Designing intelligent tutoring systems: a bayesian approach,” Enterprise Information Systems III. Edited by J. Filipe, B. Sharp, and P. Miranda. Springer Verlag: New York, pp. 146–152, 2002.
  • [16] C. Conati, A. S. Gertner, K. VanLehn, and M. J. Druzdzel, “On-line student modeling for coached problem solving using bayesian networks,” in User Modeling.   Springer, 1997, pp. 231–242.
  • [17] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, and J. Sohl-Dickstein, “Deep knowledge tracing,” in Advances in neural information processing systems, 2015, pp. 505–513.
  • [18] A. T. Corbett and J. R. Anderson, “Knowledge tracing: Modeling the acquisition of procedural knowledge,” User modeling and user-adapted interaction, vol. 4, no. 4, pp. 253–278, 1994.
  • [19] M. V. Yudelson, K. R. Koedinger, and G. J. Gordon, “Individualized bayesian knowledge tracing models,” in

    International conference on artificial intelligence in education

    .   Springer, 2013, pp. 171–180.
  • [20] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the royal statistical society. Series B (methodological), pp. 1–38, 1977.
  • [21] G. D. Forney, “The viterbi algorithm,” Proceedings of the IEEE, vol. 61, no. 3, pp. 268–278, 1973.
  • [22] G. Chung and D. Kerr, “A primer on data logging to support extraction of meaningful information from educational games: an example from save patch,” CRESST Report, vol. 814, 2012.
  • [23] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14.   Oakland, CA, USA, 1967, pp. 281–297.