Problem-solving competency has been recognized as a central skill that today’s students need to thrive and shape their world (Griffin Care, 2014; OECD, 2018). As a result, the measurement of problem-solving competency has received much attention in education in recent years (e.g. OECD, 20122, 20121, 2017; Mullis Martin, 2017; US Department of Education, 2013). Computer-based simulated interactive tasks have become a popular tool for the measurement of problem-solving competency. They have been used in many national and international large-scale assessments, including the Program for International Student Assessment (PISA), the International Assessment of Adult Competencies (PIAAC), and the National Assessment of Educational Progress (NAEP). Comparing with static problems, interactive tasks better reflect the nature of problem solving in real life by requiring students to uncover some of the information needed to solve the problem through interactions with a computer-simulated environment, while static problems disclose all information at the outset.
For simulated tasks, data are available not only for the final outcome of problem solving (success/failure), but also the entire problem-solving process recorded by computer log files. A computer log file contains events during a student’s problem-solving process (i.e., actions taken by the student) and the time stamps of these events, where the final outcome is completely determined by the problem-solving process. Therefore, problem-solving process data should contain more information about one’s problem-solving competency than the final outcome. However, due to the complex structure of log file process data, it is unclear how meaningful information can be extracted. Comparing with traditional multivariate data that are commonly encountered in social and behavioral sciences, such as testing data and survey data, computer log file data are highly unstructured. Different students can have completely different computer log files, with different events occurring at different time points.
In this paper, we propose a probabilistic measurement model, called the Continuous-Time Dynamic Choice (CTDC) model, for extracting meaningful information from log file process data. We first provide a review of marked point process (Cox Isham, 1980), a stochastic process whose realization takes the same form as log file process data. We then propose a parametrization of the marked point process, in which the occurrence of a future action and its time stamp depend on (1) the entire event history of problem solving, (2) person-specific characteristics, including the latent traits of problem-solving competency and action speed, and (3) task structure. In particular, we assume the choice of the next action is driven by a competency trait, while the time of action depends on a speed trait. This model can be applied to data from one or multiple tasks.
The analysis of problem-solving process data has received much attention in recent years. A standard strategy to analyze such data is based on summary statistics defined by expert knowledge. These summary statistics are used for group comparison (e.g., comparing the success and failure groups) and/or multivariate analysis (e.g., factor analysis). Research taking this approach includesGreiff . (2015), Scherer . (2015), Greiff . (2016), and Kroehne Goldhammer (2018), among others. Another type of analysis focuses on extracting important features/latent features from process data. Along this direction, He von Davier (2015, 2016)
took an n-gram approach to extract sequential features in data and screen out the important ones based on their predictive power of the problem-solving outcome.Xu . (2018) proposed a latent class model for finding latent groups among students based on log file data. Tang . (2019) proposed a multidimensional scaling approach to extracting latent features and show empirically that the extracted latent features tend to contain more information than the binary problem-solving outcome, in terms of out-of-sample prediction of related variables. Besides these directions, Chen, Li, Liu Ying (2019) proposed an event history analysis approach from a prediction perspective, studying how problem-solving process data can be used to predict the problem-solving outcome and duration. However, all these approaches do not provide a probabilistic measurement model that directly links together interpretable person-specific latent traits, the structure of problem-solving task, and log file process data.
The proposed CTDC model is closely related to the Markov decision process (MDP) measurement model proposed byLaMar (2018) that is also used to measure student competency based on within-task actions. In particular, both the CTDC model and the MDP measurement model assume a dynamic choice model to characterize how the next action depends on the current status of the student (as a result of previous actions) and a person-specific competency latent trait. In both models, a person with a larger latent trait level is more likely to choose a better action. However, there are several major differences between the two models. First, the MDP measurement model is only for the action sequences, without taking into account the time information of the actions that may also be informative. On the other hand, by modeling log file data as a marked point process, the proposed framework is able to make use of information from both the actions and their time stamps. Second, the two models quantify the effectiveness of an action differently. The MDP measurement model follows a Markov decision theory framework. It measures the effectiveness of an action given the student’s current state by the value of a Q-function (i.e., state-action value function) which is obtained by solving an MDP optimization problem (see Puterman, 2014, for the details of Markov decision process). This approach is possibly more useful for complex tasks where the value of actions is hard to evaluate. On the other hand, we focus on tasks for which there exists a direct measure of action effectiveness based on their design. In fact, for relatively simple tasks, such as those in large-scale assessments, it is often clear whether or not an action should be taken at each stage, which provides a measure of action effectiveness. In particular, we demonstrate how a reasonable measure of action effectiveness can be constructed using a motivating example from PISA 2012, in which case the proposed approach is much easier to use. Finally, the proposed model is developed under a general structural equation modeling framework that can simultaneously analyze multiple tasks, while the MDP measurement model focuses on data from a single task.
, we propose a continuous-time dynamic choice (CTDC) measurement model under the marked point process framework, and discuss the estimation of model parameters. In Section4, the proposed model is applied to real data from PISA 2012, followed by a simulation study in Section 5. We end with discussions in Section 6.
2 Log File Data as a Marked Point Process
2.1 A Motivating Example
To introduce the structure of log file process data, we start with a motivating example, which is the second task from a released unit of PISA 2012 that contains three tasks.111The task is available online from http://www.oecd.org/pisa/test-2012/testquestions/question5/. This released unit is called TICKETS. In this task, students were asked to use a simulated automated ticketing machine to buy train tickets under certain constraints on the type of tickets. Figure 1 provides a screen shot of the user interface for this unit of tasks. The instruction of the ticketing machine is given below.
“A train station has an automated ticketing machine. You use the touch screen on the right to buy a ticket. You must make three choices.
Choose the train network you want (subway or country).
Choose the type of fare (full or concession).
Choose a daily ticket or a ticket for a specified number of trips. Daily tickets give you unlimited travel on the day of purchase. If you buy a ticket with a specified number of trips, you can use the trips on different days.
The BUY button appears when you have made these three choices. There is a CANCEL button that can be used at any time BEFORE you press the BUY button.”
In this task, the students were asked to find and buy the cheapest ticket that allows them to take four trips around the city on the subway, within a single day. As students, they can use concession fares. The accomplishment of the task requires multiple interactions between the student and the task interface. In particular, the student needs to know the concession fare of a daily subway ticket and the concession fare of four individual subway tickets, by visiting the corresponding screens. Then the student needs to verify which of these is the cheapest ticket and make the purchase. We say the task is successfully solved if a student purchases four individual subway tickets in concession fare after comparing its price to that of a daily subway ticket in concession fare.
This task is designed under the finite-state automata framework (Buchner Funke, 1993; Funke, 2001), one of the most commonly used design for problem-solving tasks. In fact, it is one of the two design frameworks for all problem-solving tasks in PISA 2012. Tasks following the finite-state automata design share a similar structure and the proposed CTDC model can be applied to all such tasks.
The log file of a student solving a task is recorded using a long data format, with each row describing an action and its time stamp. For an automata task, a student’s action can be represented by the resulting new state of the system. Figure 2 visualizes the problem-solving process of a student in PISA 2012 and Table 1 shows the corresponding log file record.222The raw data are available from the OECD website: http://www.oecd.org/pisa/pisaproducts/database-cbapisa2012.htm. Note that data presented in Table 1 have been preprocessed from the raw PISA 2012 log file data and the variable names have been simplified. In this example, the student was only aware of the fare of a concession daily ticket for city subway and purchased it. He/she did not check the fare of four concession individual tickets. Thus, although the ticket the student bought is a concession one and can be used for four trips by city subway in a day, it is not the cheapest one and thus does not completely satisfy the task requirement.
2.2 A Marked Point Process View
We now provide a mathematical treatment of log file data, taking a marked point process framework. Consider a continuous-time domain , with the task starting at time . Let be the number of event types, where each event type corresponds to a state of the system that can repeatedly occur. For the above TICKETS example, each state corresponds to a different screen of the task interface that can be represented by the last five columns of Table 1. We define 21 states for the TICKETS task as given in the appendix. With well-defined event types, log file data can be recorded by a double sequence , where is the time stamp of an event satisfying , and denotes the event types. Such a double sequence can be modeled by a marked point process (Cox Isham, 1980), a stochastic process model commonly used in event history analysis (Cook Lawless, 2007).
A marked point process can be used to describe how future events depend on the event history at any time , where the event history is described by an information filtration . For log file data, , which contains all available information up to time . A marked point process model can be characterized by a ground intensity function and conditional density functions ; see Rasmussen (2018) for a review. In particular, the ground intensity function
describes the instantaneous probability of event occurrence, i.e.,
A task typically has a terminal state. Once the terminal state is reached, the task is completed and no event will happen afterwards, i.e., , for greater than the time of reaching the terminal state. For the TICKETS example, the terminal state is reached, once a student clicks the “BUY” button.
In addition, the conditional density function describes the instantaneous conditional probability of the th type of event occurring, given that one event will occur, i.e.,
In our application, the conditional density functions often satisfy some zero constraints, because some types of events cannot happen immediately after some others. For the TICKETS task, such constraints are brought by the design of the system interface. For example, one cannot immediately reach the state (CITY SUBWAY, CONCESSION, NULL, NULL, 0) from the state (NULL, NULL, NULL, NULL, 0), where the five elements of a state correspond to the last five columns of Table 1. We use to denote all the reachable states at time given event history . Then for any , . For , the total probability law needs to be satisfied by the definition of conditional density functions, i.e.,
For each event type , there exists a measure of its effectiveness given by the structure of the problem-solving task, denoted by . A larger value of indicates higher effectiveness of event type as the next action. For the above TICKETS example, the effectiveness of an action can be measured by whether it contributes to the final success of solving the task. If an action contributes to the final success, then we set , and otherwise . For example, at the starting screen (see Figure 1), the action of clicking “CITY SUBWAY” is always an effective action given the requirement of the task, while clicking “COUNTRY TRAIN” or “CANCEL” is not. It is worth pointing out that whether or not an action is effective depends on the event history. Suppose that a student is currently at state (CITY SUBWAY, CONCESSION, NULL, NULL, 0), the screen of which is shown in Figure 3. If neither the concession fare of a daily subway ticket nor that of four individual subway tickets is known, then clicking either “DAILY” or “INDIVIDUAL” is effective but clicking CANCEL is not. However, if according to the event history the fare of a concession daily subway ticket is known while that of four concession individual subway tickets is unknown, then only clicking “INDIVIDUAL” is effective at the current stage. A complete list of is shown in the appendix.
Data from tasks can be viewed as marked point processes. Thus, all the above quantities are task-specific and will be indexed by . Table 2 summarizes the key elements for describing and modeling log file data from the th task. In what follows, we discuss the parametrization of the ground intensity and the conditional density functions, which links together person-specific latent traits, the structure of tasks, and log file process data.
|The type (mark) of the th event in the process of task .|
|The time stamp of the th event in the process of task .|
|denotes the event sequence in the process of task .|
|denotes the sequence of time stamps in the process of task .|
|The event history at time for task , where .|
|The set of event types that can immediately occur at time for task .|
|The measure of effectiveness for event type of task at time .|
|The ground intensity function of the marked point process for task . It describes the|
|instantaneous probability of event occurrence.|
|The conditional density functions of the marked point process. It describes the|
|instantaneous conditional probability of the th type of event occurring for task .|
3 Proposed Model
3.1 Specification of CTDC Model
We introduce two person-specific latent variables, and , to denote student ’s problem-solving competency and action speed traits, respectively. In particular, we assume to be bivariate normal, , where and .
We consider log file process data from tasks that can be viewed as marked point processes. We first assume local independence across tasks. That is, we assume the marked point processes to be conditionally independent, given the two latent traits. Figure 4 provides the path diagram for the proposed model, where the details of the model will be introduced in the sequel.
Under the local independence assumption, it suffices to model data from one task. Specifically, we propose a model to describe how the conditional density functions and the ground intensity function depend on the two latent traits. Figure 5 provides the path diagram for the proposed within-task model. In this model, the next action, as modeled by the conditional density function, depends only on the problem-solving competency trait and the event history. It does not directly depend on the action speed trait. In addition, the time stamp of the next action, as modeled by the ground intensity function, depends only on the action speed factor and the event history. It does not directly depend on the competency trait. The specifications of the submodel for actions and that for time stamps are described below, respectively.
Conditional density functions.
A conditional density function describes the conditional probability of a student choosing state
given that he/she will take an action in the next moment. It can be viewed as a discrete choice model. Consider the conditional density function for event typeof task at time
. We adopt a multinomial logit model, taking the form
where is a task-specific easiness parameter and the rest of the notations are introduced previously in Table 2
. This choice model takes the form of a Boltzmann machine, which is similar to the within-task choice model inLaMar (2018). It is a divide-by-total type model that is commonly used in the item response theory (IRT) literature (e.g., Thissen Steinberg, 1986).
By the definition of and given , the larger the value of , the more likely the effective actions will be taken. In particular, when , , for all such that . That is, the most effective actions will be chosen with probability one. Similarly, when , , for all such that ; i.e., the most ineffective actions will always be taken. Moreover, when , , for all . In that case, the student performs in a purely random manner.
In this action choice submodel (1), parameter reflects the overall easiness of the task. Controlling for the value of , tasks with a larger value of tend to be easier, as the effective actions are more likely to be chosen.
The ground intensity function essentially describes the speed of a student taking actions. For simplicity, we assume a student keeps a constant speed within a task once he/she has started working on the problem. That is,
for satisfying . An exponential form is assumed, as an intensity function has to be non-negative. Here, gives the baseline intensity of taking actions in solving task . The larger the , the faster the students proceed in general. Given , the larger the value of , the sooner the next action will be taken. In fact, it is easy to show that the expected time to the next action is .
We point out that the first action needs to be treated differently, as the time to the first action involves not only taking an action, but also reading and understanding the requirement of the task. In the proposed method, we do not specify a model for . Instead, all the inference will be based on a conditional likelihood estimator, in which is conditioned upon.
We set the means of the latent traits to ensure the identifiability of the task-specific parameters. Thus, the fixed parameters of the model include , , , and . These parameters are estimated by a maximum marginal likelihood (MML) estimator. Consider students taking the tasks. We denote as the observed process data from student for task , , , where and , and is the total number of actions taken by student on task . Recall that and are the latent traits of student .
We derive the likelihood function based on the conditional distribution of given , , and . This conditional likelihood function takes the form
where we denote to simplify the notation. Making use of the across-task local independence assumption, the marginal likelihood function takes the form
whereand covariance matrix and , and . Then our MML estimator of is
where denotes the positive semi-definiteness of . The computation of (4
) is carried out using an Expectation-Maximization (EM) algorithm(Dempster ., 1977). Given the estimated fixed parameters, the latent traits can be estimated using either the expected a priori (EAP) estimator or the maximum a priori (MAP) estimator. In the subsequent analysis, the EAP estimator is adopted.
3.3 Connections with Related Models
In what follows, we make connections between the proposed model and related models in the psychometric literature.
Connection with MDP measurement model.
We first compare the proposed model with the MDP measurement model of LaMar (2018), which models the action sequence of a student solving a single task. Specifically, in LaMar (2018), each student’s action sequence is described by a discrete-time MDP which also depends on a person-specific latent trait. In this MDP, the next action follows a choice model in a similar form as (1), but the effectiveness measure is replaced by the -function value of the process. Given the MDP, the -function value can be obtained by solving an optimization problem. As a result, there is no need to specify a measure of effectiveness for each possible action at any time point. This feature makes the MDP measurement model very suitable for complex tasks that can be solved using many different strategies (e.g., board games), where the effectiveness of each potential action can be hard to specify.
However, the power of the MDP measurement model comes with a high computational cost, as its estimation requires to iteratively alternate between updating person parameters and solving MDPs by dynamic programming. For relatively simple tasks like the above TICKETS example and many other tasks used in large-scale assessments, the action effectiveness can be reasonably specified. For such tasks, the proposed model is more suitable, given its dominant computational advantage.
Moreover, the proposed model makes use of information from both the action sequence and time stamps, while the MDP measurement model only focuses on the action sequence. In particular, time stamps are incorporated into the proposed model through a continuous-time marked point process view of the log file data.
Connection with IRT models.
We make several connections between the proposed model and IRT models. First, the action choice submodel (1) can be viewed as a nominal response model of a divide-by-total type (Thissen Steinberg, 1986)
. Each action here is similar to an item in IRT. The key difference is that the actions in the current model are not conditionally independent given the latent trait level, while such an conditional independence assumption is typically adopted for items in IRT models. In the proposed model, conditional dependence is introduced in a sequential manner, where the choice of an action can depend on the previous actions. In addition, nominal response models in IRT typically have choice-specific parameters, while the proposed model does not contain event-type-specific or event-history-specific parameters. This is because, the number of event types can be large, and the possible states of the event history can be even larger. Introducing such parameters can result in poor model performance due to the high variance in parameter estimation.
models the joint distribution of item-level responses and response times with two latent traits, one on competency (i.e., ability) and the other on speed, respectively. The item responses and response times invan der Linden (2007) are analogous to the actions and the time gaps between actions in our setting, respectively. Similarly, in van der Linden (2007), the item responses only depend on the competency trait and the response times only depend on the speed trait, and a correlation is allowed between the two latent traits. In some sense, the proposed model can be viewed as an extension of van der Linden (2007) for process data, where the major difference is the introduction of event history in the current model to account for temperal dependence.
Third, the proposed model for data from multiple tasks induces an IRT model for the task outcomes. More precisely, we denote as the final outcome for task , where if the task is successfully solved and , otherwise. Note that is a deterministic function of the action sequence . As a result, based on the across-task local independence assumption and the specification of the within-task model as given in Section 3.1, , …, are conditionally independent given the competency trait . Moreover, the probability will be a monotone increasing function of under very mild regularity conditions on the task structure; i.e., a higher the competency level leads to higher chance of solving the task. In that case, the final outcomes , …, given essentially follows a nonparametric monotone IRT model (Ramsay Abrahamowicz, 1989).
4 Case Study
To demonstrate the proposed CTDC model, we apply it to log file data from the first two tasks of the TICKETS unit in PISA 2012. The TICKETS unit contains three tasks, among which the second task is introduced in Section 2.1 as a motivating example. In the first task, the students were asked to buy a full fare, country train ticket with two individual trips. This task is relatively simple. To solve the task, one first needs to select the network “COUNTRY TRAINS”, then choose the fare type “FULL FARE”, choose ticket type “INDIVIDUAL”, select the number of tickets “2”, and finally click the “BUY” button.
We analyze log file process data from the first two tasks of the unit333The first task of this Unit is available from http://www.oecd.org/pisa/test-2012/testquestions/question4/. The third task is not included in the analysis, as its user interface is not publicly available though its description can be found in OECD (20141) and its data have been released.. These data are from 392 United States students who completed both tasks. For simplicity, students who gave up in one of the two tasks during the problem-solving process are excluded from this analysis. The list of states and effectiveness of event types for the first task is given in the appendix. Among the 392 students, 266 successfully solved the first task, 115 successfully solved the second, and 97 solved both. Figure 6 shows the histograms of three summary statistics for the process data, including students’ total number of actions, total duration, and average time per action. Note that time to first action is included in calculating total duration, but is excluded when calculating average time per action.
The latent traits extracted by the proposed model will be validated by comparing them with the students’ overall performance in PISA 2012 on problem-solving tasks. More precisely, PISA 2012 has in total 16 units of the problem-solving tasks. These 16 units were grouped into four clusters, each of which was designed to be completed in 20 minutes. Each student was given either one or two clusters. Students’ problem performance was scaled using an IRT model based on the outcomes of the tasks they received (OECD, 20142). For each student, five plausible values were generated from the corresponding posterior distribution of a proficiency trait (OECD, 20142). Following Greiff . (2015), we use the first plausible value as the continuous overall problem-solving performance score of the students.
We apply the proposed model to data from the two tasks. The MML estimate of the fixed parameters is given in Table 3. The estimated correlation between the two latent traits is , with a confidence interval . It suggests that the problem-solving competency trait and the action speed trait have a very weak negative correlation that is not significantly different from zero. Panel (a) of Figure 7 provides the scatter plot of the EAP estimates of the two latent traits, where no clear association can be found between the estimated traits.
Real data analysis: MML estimates of the fixed parameters and their standard errors.
According to the estimated easiness parameters and as shown in Table 3, the second task is slightly easier in the choice of effective actions within a task, though the second task seems more difficult according to its design and has a lower success rate according to the task outcome data. There are two possible explanations. First, the difficulty level of the first task may be boosted as it was the students’ first encounter with this ticketing machine, while in the second task, the students already had a good understanding of the system. This difference in the familiarity with the task interface can be reflected by the task-specific easiness parameters. Second, although it is difficult for students to completely solve the second task, it is not very difficult to partially fulfill the requirements. That is, a student may purchase a daily subway ticket or four individual subway tickets in concession fare without comparing their prices. In this process, many effective actions are taken, which reduces the overall difficulty of the task. Based on the estimated baseline intensities and , the students tend to act slightly faster in the second task than in the first. This is possibly due to the students’ increased level of familiarity with the task interface when solving the second task.
Validating the latent traits.
We now investigate the relationship between the EAP estimates of the latent traits and student’s overall performance score given by OECD. Panels (b) and (c) of Figure 7 show the scatter plots of the EAP estimates of the two latent traits versus the overall performance score, respectively. From these plots, a moderate positive association seems to exist between the estimated competency trait and the overall performance, while there seems no clear association between the estimated speed trait and the overall performance.
We further regress the overall performance score on the estimated traits to investigate their relationship. Specifically, three models are fitted, denoted as models through , respectively. In these three models, we regress the overall performance score on the estimated competency trait, the estimated speed trait, and both, respectively. The parameter estimation results of these three models are given in Table 4 and the values are given in Table 5. According to the results of models and , the competency trait extracted from the process data is a significant predictor of the overall performance score. In particular, its slope parameter is positive in both models, meaning that students with a higher competency score tend to have better overall performance in problem solving. In addition, based on the of model , the competency trait alone explains 32.34% of the information in the overall performance score.
According to the result of model , the speed trait alone has almost no explanation power of the overall performance, with its slope parameter insignificant () and value as small as . Interestingly, however, the speed trait becomes significant () in model when both traits are included as covariates. Comparing with model , the increase in the value is 1.09%, with a 95% bootstrap confidence interval (0.03%, 3.44%). The slope estimate for the speed trait is positive, meaning that students with higher speed tend to have better overall performance, when controlling for their competency trait level.
Fitting CTDC model to single tasks.
We further investigate the explanation power of the latent traits extracted from each single task. That is, we fit the proposed model to data from each single task and obtain the EAP estimate of the two traits. Then we regress the overall performance score on the estimated traits. This results in two regression models, denoted as and , for the two tasks, respectively. As given in Table 5, the values of these models are 24.18% and 23.78%, respectively. Comparing model with model , the improvement in the value is 9.25%, with 95% bootstrap confidence interval (4.46%, 13.85%). In addition, comparing model with model , the improvement in is 9.65%, with 95% bootstrap confidence interval (3.56%, 15.30%). This result implies that the joint analysis of the two tasks extracts more meaningful information than that of each single task. The information gain from adding one task in the analysis reflects its unique information that is not shared with the other task.
Process data versus final outcome.
We compare the explanation power of the extracted latent traits from the fitted models with those of the final outcomes. Specifically, in models and , we regress the overall performance score on the binary final outcome (success/failure) of each single task, respectively. In model , we regress the overall performance on the outcomes of the two tasks. The values of the fitted models are given in Table 5.
First, we compare the values of models and . Their difference is -0.19%, with 95% bootstrap confidence interval (-4.76%, 3.78%). This result implies that the process data of the first task may not provide more information than the final outcome. This is not surprising, given that the requirement of the task is straightforward and the task can be solved using a small number of steps.
Second, we compare the values of models and . Their difference is 6.31%, with 95% bootstrap confidence interval (0.11%, 13.60%). This result suggests that the process data from the second task seem to contain more information about the students’ overall performance than the corresponding binary outcome. This is likely due to that the second task is more complex.
Finally, the difference in the values of models and is -0.63%, with 95% bootstrap confidence interval (-6.18%, 5.32%). It suggests that the process data of the two tasks do not contain significantly more information about the students’ overall problem-solving performance than their final outcomes. This is likely due to that the information gain from the process data of the second task can be almost completely explained by the unique information in the first task.
We end this section with some discussions. First, the comparison based on the overall performance score may not be completely fair. The information in the task final outcomes may be overestimated, due to the use of the overall performance score as the standard for the way it is constructed. It may be more fair to validate the extracted latent traits by the students’ overall performance on the tasks excluding the current ones.
Second, we point out that the amount of additional information process data contain is largely determined by the design of the tasks. We believe that tasks which are more complex and require more steps to solve have more additional information in the process data. For such tasks, we may only need a small number of tasks to accurately evaluate students’ performance, by extracting information in process data.
Finally, the proposed model allows us to investigate task-specific characteristics of problem-solving processes, including the difficulty level and the baseline intensity. Such information can provide useful feedbacks to the design of the tasks.
We now provide a simulation study to further investigate the proposed model and its estimation.
Following the setting of the real data example, we simulate data from two tasks (i.e., ). Two sample sizes are considered, including and . In addition, we consider three settings for the correlation between the two latent traits, including and 0.25. The structure of the two tasks is set the same as those of the case study, and the model parameters except for are set the same as the estimates in Table 3. The covariance between the two traits is determined by the correlation and the variances of the two traits, i.e., . This leads to six different settings, as listed in Table 6. For each setting, we generate 50 independent replications using the proposed CTDC model.
The estimation of the fixed parameters is shown in Figure 8, where each panel corresponds to a fixed parameter. In each panel, six boxplots are shown that correspond to different simulation settings, respectively. Each boxplot shows the estimation error of the corresponding parameter over 50 replications. As we can see, the MML estimate of the fixed parameters is reasonably accurate under all the simulation settings. In addition, the estimation accuracy improves when the sample size increases. Moreover, the different settings for do not substantially affect the estimation accuracy of s and s, but they do seem to affect the estimation accuracy for .
We further look at the estimation of the latent traits. Specifically, we measure estimation accuracy by the mean squared error (MSE) of the EAP estimate of the two traits. The results are given in Figures 9 and 10, where the two figures provide the results for the competency and speed traits, respectively. In each figure, the six panels correspond to the six simulation settings, respectively. For each panel of each figure, three boxplots are shown, where the EAP estimate of the corresponding latent trait is based on (1) the joint analysis of the two tasks, (2) the first task, and (3) the second task, respectively. By comparing the first boxplot with the other two, we see that the joint analysis of the two tasks leads to a higher accuracy in the estimation of the latent traits. In addition, by comparing the second boxplot with the third, we see that data from the second task lead to more accurate estimation of the latent traits, suggesting that the second task tends to be more informative. Furthermore, the between-replication variability tends to be smaller when the sample size becomes larger.
In this paper, we propose a latent variable model for measuring problem-solving related traits based on log file process data. We take an event history analysis framework, under which data within a task are modeled as a marked point process and then multiple tasks are linked together using a local independence assumption. In the proposed model, a marked point process is characterized by two components, including (1) conditional density functions for sequential actions and (2) a ground intensity function for time stamps. A parametrization of these two components is given that links together person-specific latent traits, the structure of problem-solving task, and log file process data. In particular, we model the conditional density functions using a Boltzmann machine choice model, where the chance of an action being chosen depends on the event history, the level of problem-solving competency trait, and a task-specific easiness parameter. In addition, the ground intensity is assumed to depend on an action speed trait and a task-specific baseline intensity parameter. The proposed model is applied to process data from two problem-solving tasks in PISA 2012. The estimated model parameters provide sensible characterizations of the tasks and the distribution of the two latent traits. The extracted latent traits are validated by comparing them with students’ overall problem-solving performance score reported by PISA 2012. The main findings include: (1) both latent traits are significant predictors of students’ overall performance, with the prediction power mainly from the competency trait, (2) the joint analysis of the two tasks provide more information than the analysis of each single task, and (3) the process data of the second task provide more information than its final outcome, while the process data of the first task does not seem to contain additional information.
We point out that the proposed method is very flexible in analyzing log file process data with different types of data missingness. First of all, thanks to the across-task local independence assumption, the proposed method still applies when some students’ data are missing completely at random (MCAR) on a subset of tasks (e.g., due to a planned missing data design). This is similar to treating MCAR item responses in a local independence IRT model. Second, the proposed method is also powerful in handling data that are right censored in time within a task. More precisely, a process is said to be right censored at time , when data after time are not observed. For example, right censoring can happen when a student does not have enough time to complete the task. Thanks to the statistical properties of marked point process, the proposed inference procedures can be easily extended to process data with an independent censoring time (e.g., Andersen ., 1988). Thus, we can make statistical inference for students who do not complete one or multiple tasks.
We further discuss several future directions of the proposed method. First, computationally more efficient methods may be developed for the estimation of the proposed model. Due to the complexity and size of process data and the numerical integrations involved, the EM algorithm adopted here may not be sufficiently fast. In fact, computationally more efficient algorithms can be developed for the proposed MML estimator, such as the Metropolis-Hastings Robins-Monro algorithm (Cai, 2010) and the stochastic EM algorithms (Celeux, 1985; Diebolt Ip, 1996; Zhang ., 2019). In addition, the joint likelihood estimator may be a good alternative estimator that treats the person-specific latent traits as fixed parameters (Haberman, 1977; Chen, Li Zhang, 20191, 20192). Its computation is much faster than the MML estimator, as it avoids numerical or Monte Carlo integrations that is computationally intensive. Given the large amount of information for each student from process data, consistent estimation of both fixed parameters and latent variables may still be obtained.
Second, similar to other latent-variable-based measurement models, the proposed model can be combined with structural models to study the relationship between the problem-solving traits and other variables under a structural equation modeling framework. For example, for PISA data, it is often of interest to understand the relationship between students’ problem-solving traits, and other variables including cognitive abilities and other background variables from the student, parent, and school questionnaires of the PISA survey.
Finally, this model can be extended to measure multiple latent traits, provided that design information is available about the traits needed in each step that may depend on the problem-solving event history. In fact, problem-solving behavior is likely driven by multiple latent traits. For example, the PISA 2012 framework decomposes problem solving into four dimensions based on the corresponding cognitive processes, including “exploring and understanding”, “representing and formulating”, “planning and executing”, and “monitoring and reflecting” (OECD, 20141). The current model can be extended to measure these finer-grained dimensions, when design information is available on the dimensional structure in each problem-solving step.
- Andersen . (1988) andersen1988censoringAndersen, PK., Borgan, Ø., Gill, RD. Keiding, N. 1988. Censoring, truncation and filtering in statistical models based on counting processes Censoring, truncation and filtering in statistical models based on counting processes. Contemporary Mathematics8019-59.
- Buchner Funke (1993) buchner1993finiteBuchner, A. Funke, J. 1993. Finite-state automata: Dynamic task environments in problem-solving research Finite-state automata: Dynamic task environments in problem-solving research. The Quarterly Journal of Experimental Psychology4683–118.
- Cai (2010) cai2010highCai, L. 2010. High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika7533–57.
- Celeux (1985) celeux1985semCeleux, G. 1985. The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quarterly273–82.
- Chen, Li, Liu Ying (2019) chen2019statisticalChen, Y., Li, X., Liu, J. Ying, Z. 2019. Statistical analysis of complex problem-solving process data: An event history analysis approach Statistical analysis of complex problem-solving process data: An event history analysis approach. Frontiers in Psychology101-10.
- Chen, Li Zhang (20191) chen2019jointChen, Y., Li, X. Zhang, S. 20191. Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika84124–146.
- Chen, Li Zhang (20192) chen2019structuredChen, Y., Li, X. Zhang, S. 20192. Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. Journal of the American Statistical Association. In press
- Cook Lawless (2007) cook2007statisticalCook, RJ. Lawless, J. 2007. The statistical analysis of recurrent events The statistical analysis of recurrent events. New York, NYSpringer.
- Cox Isham (1980) cox1980pointCox, DR. Isham, V. 1980. Point processes Point processes. Boca Raton, FLCRC Press.
- Dempster . (1977) dempster1977maximumDempster, AP., Laird, NM. Rubin, DB. 1977. Maximum likelihood from incomplete data via the EM algorithm Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological)391–22.
Diebolt Ip (1996)
diebolt1996stochasticDiebolt, J. Ip, EH.
Stochastic EM: Method and application Stochastic em:
Method and application.
W. Gilks, S. Richardson D. Spiegelhalter (), Markov chain Monte Carlo in practice Markov chain Monte Carlo in practice ( 259–273).New York, NYCRC Press.
- Funke (2001) funke2001dynamicFunke, J. 2001. Dynamic systems as tools for analysing human judgement Dynamic systems as tools for analysing human judgement. Thinking & Reasoning769–89.
- Greiff . (2016) greiff2016understandingGreiff, S., Niepel, C., Scherer, R. Martin, R. 2016. Understanding students’ performance in a computer-based assessment of complex problem solving: An analysis of behavioral data from computer-generated log files Understanding students’ performance in a computer-based assessment of complex problem solving: An analysis of behavioral data from computer-generated log files. Computers in Human Behavior6136–46.
- Greiff . (2015) greiff2015computerGreiff, S., Wüstenberg, S. Avvisati, F. 2015. Computer-generated log-file analyses as a window into students’ minds? A showcase study based on the PISA 2012 assessment of problem solving Computer-generated log-file analyses as a window into students’ minds? A showcase study based on the PISA 2012 assessment of problem solving. Computers & Education9192–105.
- Griffin Care (2014) griffin2014assessmentGriffin, P. Care, E. 2014. Assessment and teaching of 21st century skills: Methods and approach Assessment and teaching of 21st century skills: Methods and approach. New York, NYSpringer.
- Haberman (1977) haberman1977maximumHaberman, SJ. 1977. Maximum likelihood estimates in exponential response models Maximum likelihood estimates in exponential response models. The Annals of Statistics5815–841.
- He von Davier (2015) he2015identifyingHe, Q. von Davier, M. 2015. Identifying feature sequences from process data in problem-solving items with n-grams Identifying feature sequences from process data in problem-solving items with n-grams. LA. van der Ark, DM. Bolt, WC. Wang, JA. Douglas SM. Chow (), Quantitative psychology research Quantitative psychology research ( 173–190). New York, NYSpringer.
- He von Davier (2016) he2016analyzingHe, Q. von Davier, M. 2016. Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. Y. Rosen, S. Ferrara M. Mosharraf (), Handbook of research on technology tools for real-world skill development Handbook of research on technology tools for real-world skill development ( 750–777). Hershey, PAIGI Global.
- Kroehne Goldhammer (2018) kroehne2018conceptualizeKroehne, U. Goldhammer, F. 2018. How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika45527–563.
- LaMar (2018) lamar2018markovLaMar, MM. 2018. Markov decision process measurement model Markov decision process measurement model. Psychometrika8367–88.
- Mullis Martin (2017) IEA2017timssMullis, IV. Martin, MO. 2017. TIMSS 2019 Assessment Frameworks TIMSS 2019 assessment frameworks. Boston, MATIMSS & PIRLS International Study Center.
- OECD (20121) oecd2012literacyOECD. 20121. Literacy, Numeracy and Problem Solving in Technology-Rich Environments: Framework for the OECD Survey of Adult Skills Literacy, numeracy and problem solving in technology-rich environments: Framework for the OECD survey of adult skills. Paris, FranceOECD Publishing. http://dx.doi.org/10.1787/9789264128859-en
- OECD (20122) OECD2012pisaOECD. 20122. PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving and financial literacy PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving and financial literacy. Paris, FranceOECD Publishing. https://doi.org/10.1787/19963777
- OECD (20141) organisation2014pisaOECD. 20141. PISA 2012 results: Creative problem solving: Students’ skills in tackling real-life problems (Volume V) PISA 2012 results: Creative problem solving: Students’ skills in tackling real-life problems (Volume V).
- OECD (20142) OECD2014pisaOECD. 20142. PISA 2012 technical report PISA 2012 technical report. Paris, FranceOECD Publishing. http://www.oecd.org/pisa/pisaproducts/PISA-2012-technical-report-final.pdf
- OECD (2017) OECD2016pisaOECD. 2017. PISA 2015 Assessment and analytical framework: Science, reading, mathematic and financial literacy PISA 2015 assessment and analytical framework: Science, reading, mathematic and financial literacy. Paris, FranceOECD Publishing. https://doi.org/10.1787/9789264281820-en
- OECD (2018) OECD2018futureOECD. 2018. The future of education and skills: Education 2030 The future of education and skills: Education 2030. Paris, FranceOECD Publishing. https://www.oecd.org/education/2030/E2030%20Position%20Paper%20(05.04.2018).pdf
- Puterman (2014) puterman2014markovPuterman, ML. 2014. Markov Decision Processes.: Discrete Stochastic Dynamic Programming Markov decision processes.: Discrete stochastic dynamic programming. New York, NYJohn Wiley & Sons.
- Ramsay Abrahamowicz (1989) ramsay1989binomialRamsay, JO. Abrahamowicz, M. 1989. Binomial regression with monotone splines: A psychometric application Binomial regression with monotone splines: A psychometric application. Journal of the American Statistical Association84906–915.
- Rasmussen (2018) rasmussen2018lectureRasmussen, JG. 2018. Lecture Notes: Temporal Point Processes and the Conditional Intensity Function Lecture notes: Temporal point processes and the conditional intensity function. arXiv preprint arXiv:1806.00221.
- Scherer . (2015) scherer2015exploringScherer, R., Greiff, S. Hautamäki, J. 2015. Exploring the relation between time on task and ability in complex problem solving Exploring the relation between time on task and ability in complex problem solving. Intelligence4837–50.
- Tang . (2019) tang2019latentTang, X., Wang, Z., He, Q., Liu, J. Ying, Z. 2019. Latent Feature Extraction for Process Data via Multidimensional Scaling Latent feature extraction for process data via multidimensional scaling. arXiv preprint arXiv:1904.09699.
- Thissen Steinberg (1986) thissen1986taxonomyThissen, D. Steinberg, L. 1986. A taxonomy of item response models A taxonomy of item response models. Psychometrika51567–577.
- US Department of Education (2013) national2013technologyUS Department of Education. 2013. Technology and engineering literacy framework for the 2014 National Assessment of Educational Progress. Technology and engineering literacy framework for the 2014 National Assessment of Educational Progress. https://nagb.gov/content/nagb/assets/documents/publications/frameworks/technology/2014-technology-framework.pdf
- van der Linden (2007) van2007hierarchicalvan der Linden, WJ. 2007. A hierarchical framework for modeling speed and accuracy on test items A hierarchical framework for modeling speed and accuracy on test items. Psychometrika72287–308.
- Xu . (2018) xu2018latentXu, H., Fang, G., Chen, Y., Liu, J. Ying, Z. 2018. Latent class analysis of recurrent events in problem-solving items Latent class analysis of recurrent events in problem-solving items. Applied Psychological Measurement42478–498.
- Zhang . (2019) zhang2018improvedZhang, S., Chen, Y. Liu, Y. 2019. An improved stochastic EM algorithm for large-scale full-information item factor analysis An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology. In press