An IRT-based Model for Omitted and Not-reached Items

04/07/2019 ∙ by Jinxin Guo, et al. ∙ 0

Missingness is a common occurrence in educational assessment and psychological measurement. It could not be casually ignored as it may threaten the validity of the test if not handled properly. Considering the difference between omitted and not-reached items, we developed an IRT-based model to handle these missingness. In the proposed method, not-reached responses are captured by the cumulative missingness. Moreover, the nonignorability is attributed to the correlation between ability and person missing trait. We proved that its item parameters estimate under maximum marginal likelihood (MML) estimation is consistent. We further proposed a Bayesian estimation procedure using MCMC methods to estimate all the parameters. The simulation results indicate that the model parameters under the proposed method are better recovered than that under listwise deletion, and the nonignorable model fits the simulated nonignorable nonresponses better than ignorable model in terms of Bayesian model selection. Furthermore, the Program for International Student Assessment (PISA) data set was analyzed to further illustrate the usage of the proposed method.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Missing data is always unavoidable in many studies, including educational assessment and psychological measurement (Rose et al., 2017; Yuan et al., 2018). Recently modeling missing data mechanism has gained increasing prominence and been widely considered in order to get a more reliable evaluation. Actually, missingness would occur under many conditions. For example, test takers may fail to reach some items due to time limits. Or sometimes they may tend to omit some items for individual reasons, such as their abilities and item preference. If these missing responses could not be dealt with properly, it would bring biased parameter estimation and further threaten the validity of tests (Pohl et al., 2014; Rose et al., 2015).

To better tackle the problem of missing data, Rubin (1976) and Little and Rubin (2014) have defined three kinds of missingness: “missing completely at random”(MCAR), “missing at random”(MAR), and “not missing at random”(NMAR).

Suppose that there are examinees and items in the test. Let

denote the dichotomous response variable of examinee

to item , where means that examinee answer item correctly while otherwise. Let , where is observed and is missing. And is the complete response matrix which can be decomposed into an observed part and a missing part . Denote by the missing indicator matrix, where the missing variable can be defined as

Actually, the three categories of missingness can be characterized by the conditional distribution of given , denoted by , where is the unknown parameters set. The missingness is MCAR if the distribution of does not depend on the response data , either observed or unobserved, which can be formulated as

MAR is the situation in which missingness is independent of missing response given the observed ones, which can be written as

The missingness of MCAR or MAR is also called ignorable or uninformative (Schafer and Graham, 2002; Rose et al., 2015). However, NMAR is obviously distinct from the above two kinds of missingness. It is not independent of missing response given the observed responses, which is also called nonignorable or informative. For example, in the context of IRT, test takers may fail to answer some items because their abilities are too low to answer correctly. This kind of missing responses can be viewed as NMAR.

In fact, there exist several methods to handle missing item responses. One of the most direct and simplest approaches is listwise deletion which is also the default method dealing with missing data in some statistical softwares, such as SPSS and SAS. In this method, all cases with missing responses would be deleted. The method is very direct and effective when the missing data response rate is small. However, if the proportion of missing data is high, especially when the missingness is nonignorable, this method would cause bias and thus lead to errors in statistical inference (Rose et al., 2015, 2017; Wu et al., 2017).

So we paid more attention to modeling missing responses for nonignorable missingness. The general methods are typically based on the joint distribution of

and was constructed. Two commonly used joint models are selection models (SLM; Heckman, 1976) and pattern mixture models (PMM; Little, 1993). SLM is based on the following factorization:


And PMM can be written as:


Based on these joint models, several methods have been proposed by researchers. For example, Holman and Glas (2005) and Glas et al. (2015) introduced an IRT model for omitted items based on PMM that could simultaneously estimate IRT item parameters and the parameter about the propensity of missing data. Rose et al. (2010) derived multidimensional IRT (MIRT) to handle nonignorable item nonresponses, which was believed to have originated from general SLM (Rose, 2013). However, these methods could only be applied to omitted responses. To differentiate the omitted and not-reached items, latent regression models (LRMs) were proposed to model omitted and not-reached items (Rose et al., 2017).

Specifically, not-reached items (also called “dropout”) occurs when the test takers fail to reach some items at the end of test and omitted items (also called “intermittent”) refer to the situation where they skip one or more items and then answer the next one.

Motivated by the previous methods, this paper proposed an approach to model the omitted and not-reached items on the basis of SLM. In details, the effects of previous nonresponses on current item are modeled for not-reached items. And the correlation between ability parameter and latent person missing parameter is employed to clarify whether the missingness is nonignorable or ignorable.

The remainder of this paper was organized as follows. In Section 2, we presented the proposed method to handle binary missing item responses. The MML estimation of the item parameters and related consistency results were given in Section 3, followed by the Bayesian estimation using MCMC method. To evaluate parameter recovery and model selection, two simulation studies were conducted in Section 4. In Section 5, we carried out a detailed analysis of PISA data set to illustrate the usage of the proposed method. Finally, some issues that need to be resolved were addressed and further research directions were discussed in Section 6.

2 Handling omitted and not-reached items with IRT-based model

2.1 Two parameter IRT models

In the proposed method, two-parameter IRT models were employed to model the item response data. The probability of examinee

correctly answering item can be expressed as


where denotes the ability parameter for the th individual; and are the discrimination and difficulty parameters of item , respectively. And

is a cumulative distribution function (CDF) of standard normal or standard logistic distribution. In details, probit link could yield two-parameter normal ogive (2PNO) IRT model, that is,

, where

is the CDF of standard normal distribution. Similarly, when

is the CDF of standard logistic distribution, that is, logit link is employed, it follows that

. Therefore, , which is called two-parameter logistic (2PL) IRT model.

2.2 Modeling missing data mechanism

Considering the difference between omitted and not-reached items, different effect indexes were applied in the proposed method. Especially, not-reached responses are characterized by the cumulative missingness. Motivated by the idea of IRT models, missingness is captured by latent missing trait from two perspectives: item and person. On the basis of this, the missing data process can be modeled as


where is similar to in Equation (3), specified by probit or logit link, is the intercept parameter with constraint of that could influence the baseline probability of missingness, is the person missing parameter that measures the individual latent trait to nonresponses, and is the item missing parameter that represents the inclination of missingness caused by item. Moreover, for the th examinee,

is the previous missing vector before

th item and is responses vector of the first items. Moreover, is a function of the responses vector that characterizes the effect from response variables. And we simply took , where . In addition, is a function of the missing indicators, which denotes the impact of previous nonresponses on current item. We set when since there are no previous missing responses. Specially, in the proposed method, with was chosen to model the effect of previous missingness on the current item. Actually, the statistic exactly captures the missingness of not-reached items, and reduces the number of nuisance parameters for modeling the missing data mechanism.

In the proposed method, the ability parameter in IRT model and person missing parameter in missing part were assumed to be bivariate joint normally distributed with mean vector and covariance matrix . A graphical representation of the proposed method is present in Figure 1.

Item domain

, , , , ,


, ,


Figure 1: The graphical representation of the proposed method.

To guarantee the model identification, and are set to 0, and was fixed to 1 (Browne, 2006). Note that the generation of the nonignorable mechanism is attributed to the correlation between and . To be more specific, if the missingness is ignorable, is independent of , therefore, . If the missing data depends on both the latent ability and the person missing parameter , the missingness is nonignorable. So the proposed method could handle both ignorable and nonignorable missingness.

2.3 The likelihood function

Let , , , , , and . And the parameter set can be written as . Denote by and the observation vectors of and .

A sequence of one-dimensional conditional distributions modeling method proposed by Ibrahim et al. (1999) was employed to construct the conditional joint distribution of given , which can be written as


where is the indicator function and is given by Equation (4).

Based on Equation (1), the likelihood function of complete data could be given by


where is given by Equation (3). Note that if in Equation (6

) is missing, it would be imputed from Bernoulli

, and can be computed as




Integrating over all the missing item responses finally yields


where is the observation of and is the imputation of .

3 Estimation of model parameters and consistency results

3.1 MML estimation and consistency results

We first presented the MML estimation of IRT item parameters and item missing parameter. By integrating over the ability and person missing parameters, the marginal likelihood function could by given by

where in is assumed to be known. The MML estimation of should satisfy:

where is the log likelihood. In practice, the log likelihood equations for the item parameters could be derived using the Newton–Raphson algorithm, EM algorithm, or a combination of the two (Bock and Aitkin, 1981). Therefore, the MML estimation of could be easily obtained. Actually the MML estimation is consistent under some assumptions.

Assumption 1

For sufficiently small and for sufficiently large , , , the following integrals are finite:

where is a known finite constant.

Assumption 2

Given known , are identifiable.

Theorem 1

The MML estimation are consistent under Assumption 1-2.

Theorem 2

If the estimation

are consistent, then they are asymptotically normal with mean centered at the true parameters and variance being the inverse Fisher information matrix.

The proof of Theorem 1-2 would be presented in Appendix A.

3.2 Bayesian estimation using MCMC method

Though the MML estimation of

are consistent, the other parameters could not be estimated by MML estimation. One natural idea is to estimate the parameters in the proposed model by Monte Carlo Markov Chain (MCMC) method. Actually, MCMC and MML estimation have been already compared in context of IRT

(Kieftenbeld and Natesan, 2012; Hendrick, 2014). It was verified that there were little difference in item parameter recovery between the two methods with samples of 300 or more (Kieftenbeld and Natesan, 2012). So MCMC method was eventually employed to estimate the whole parameters in the proposed method.

We only take the proposed based on probit link as a demonstration, as 2PL IRT model can be very close to 2PNO IRT model through multiplying by a scaling constant 1.702 for the logistic item discrimination parameter (Baker and Kim, 2004). In details, Gibbs sampling was employed to estimate in unknown parameters in Equation (3) and Equation (4). And Metropolis–Hastings algorithm was adopted to estimate and .

In order to realize the Gibbs sampling for the 2PNO IRT model, the augmented was introduced for response variable , where (Albert, 1992; Albert and Chib, 1993). It was assumed that


Similarly, the independent random variables

were augmented for the missing data indicator , which were assumed to follow the normal distribution . So Equation (4) could be reformulated as


The detailed sampling process would be presented in Appendix B.

4 Simulation Studies

Two simulation studies were conducted to investigate the empirical performance of the proposed model, including parameter recovery and Bayesian model assessment.

4.1 Simulation I

Simulation I is used to compare the parameter recovery of the proposed method with that of listwise deletion.

4.1.1 Design

In the data generation, the number of examinees and the number of items were set to and , respectively. The true values of model parameters were set as follows:

As to covariance , also regarded as the correlation between and (denoted as ), were set as 0, 0.4, 0.8 to elucidate the effect of no, a little, large strength of nonignorability.

Equation (3) was used to simulate the dichotomous response data. And then Equation (4) was employed to generate the missing responses. Actually, the proportions of missing response is adjusted by . The true values of and corresponding average missing proportions (denoted by ) were set as follows:

100 datasets were simulated for 5 () 3 () =15 conditions. The MCMC sampling procedure was iterated 20,000 and the first 1,5000 iterations were discarded as burn-in. And then the expected a posteriori (EAP) estimation of each parameter can be obtained from its Markov Chain.

4.1.2 Criteria

To assess the performance of parameter recovery, two criteria were applied: mean Bias and mean absolute error (MAE).

Simply speaking, let be a vector of true parameter value. And denote by be its EAP estimation, , where is the number of replications. The mean Bias and MAE could be estimated by

4.1.3 Results

Parameter The Proposed Method Listwise Deletion Method
Bias MAE Bias MAE
(-2.2,0.02,-0.2) 0.093 0.005 0.107 0.005 0.108
-0.002 0.094 -0.016 0.095
0.000 0.292 0.001 0.292
-0.003 0.439
0.012 0.198
-0.059 0.166
0.024 0.025
0.064 0.070
0.056 0.056
-0.087 0.110
(-1.6,0.04,-0.2) 0.173 0.015 0.109 0.016 0.111
-0.010 0.101 -0.028 0.105
0.004 0.309 0.005 0.310
0.001 0.370
0.036 0.208
-0.095 0.192
0.010 0.017
0.103 0.103
0.059 0.059
-0.033 0.081
(-1.1,0.04,-0.2) 0.264 0.009 0.116 0.009 0.116
-0.017 0.112 -0.041 0.117
0.010 0.331 0.011 0.333
0.002 0.326
0.008 0.195
-0.079 0.190
0.008 0.015
0.127 0.127
0.053 0.053
-0.040 0.089
(-0.65,0.05,-0.25) 0.370 0.017 0.130 0.017 0.130
-0.063 0.133 -0.091 0.145
0.003 0.361 0.005 0.362
0.004 0.297
-0.001 0.191
-0.104 0.193
0.006 0.015
0.188 0.188
0.050 0.050
-0.046 0.091
(-0.2,0.05,-0.25) 0.463 0.018 0.146 0.020 0.146
-0.074 0.148 -0.109 0.164
0.007 0.398 0.009 0.399
-0.003 0.287
0.047 0.201
-0.180 0.220
0.006 0.012
0.198 0.198
0.056 0.056
-0.047 0.084

Note. Bias and MAE refer to the mean Bias and MAE for each parameter in the third column, respectively.

Table 1: The results of parameters recovery for ignorable nonresponses.
Parameter The Proposed Method Listwise Deletion Method
Bias MAE Bias MAE
(-2.2,0.02,-0.2) 0.098 0.013 0.106 0.004 0.106
0.003 0.098 -0.023 0.103
0.008 0.289 0.007 0.295
0.003 0.421
0.009 0.228
-0.041 0.189
0.020 0.021
0.057 0.070
-0.014 0.048
-0.051 0.096
(-1.6,0.04,-0.2) 0.178 0.006 0.112 -0.005 0.114
-0.016 0.106 -0.055 0.118
0.005 0.306 0.005 0.316
0.008 0.357
0.024 0.204
-0.067 0.199
0.004 0.015
0.103 0.103
0.010 0.044
0.012 0.100
(-1.1,0.04,-0.2) 0.266 0.010 0.127 -0.010 0.124
-0.034 0.117 -0.091 0.141
0.004 0.325 0.006 0.339
-0.007 0.316
-0.005 0.165
-0.075 0.159
0.005 0.014
0.130 0.130
0.014 0.051
0.004 0.088
(-0.65,0.05,-0.25) 0.371 0.008 0.133 -0.008 0.135
-0.069 0.139 -0.151 0.189
0.006 0.355 0.009 0.377
0.002 0.296
-0.020 0.187
-0.075 0.193
0.002 0.014
0.190 0.190
0.009 0.053
0.041 0.103
(-0.2,0.05,-0.25) 0.480 -0.003 0.153 -0.018 0.153
-0.123 0.177 -0.231 0.255
0.003 0.386 0.009 0.419
-0.002 0.286
-0.021 0.189
-0.110 0.171
0.006 0.014
0.203 0.203
-0.011 0.048
0.004 0.093

Note. Bias and MAE refer to the mean Bias and MAE for each parameter in the third column, respectively.

Table 2: The results of parameters recovery for nonignorable nonresponses ().
Parameter The Proposed Method Listwise Deletion Method
Bias MAE Bias MAE
(-2.2,0.02,-0.2) 0.097 0.011 0.105 -0.015 0.110
-0.009 0.104 -0.051 0.120
0.001 0.278 -0.002 0.303
0.001 0.373
0.026 0.206
-0.053 0.163
0.024 0.024
0.038 0.079
-0.020 0.062
-0.050 0.122
(-1.6,0.04,-0.2) 0.180 0.006 0.116 -0.032 0.126
-0.018 0.110 -0.086 0.140
0.004 0.283 0.005 0.323
0.002 0.328
0.008 0.205
-0.058 0.196
0.008 0.017
0.095 0.111
0.000 0.062
0.021 0.130
(-1.1,0.04,-0.2) 0.267 -0.002 0.122 -0.057 0.137
-0.033 0.124 -0.139 0.185
0.003 0.298 0.006 0.361
0.003 0.304
0.017 0.200
-0.094 0.196
0.008 0.017
0.119 0.136
-0.003 0.063
0.019 0.120
(-0.65,0.05,-0.25) 0.365 -0.013 0.142 -0.078 0.161
-0.085 0.161 -0.247 0.276
0.007 0.314 0.011 0.407
0.005 0.284
0.022 0.195
-0.117 0.204
0.002 0.014
0.192 0.193
0.025 0.053
0.077 0.128
(-0.2,0.05,-0.25) 0.464 -0.018 0.158 -0.082 0.181
-0.110 0.183 -0.324 0.347
0.185 0.335 0.011 0.455
0.004 0.281
0.052 0.207
-0.180 0.216
0.009 0.017
0.202 0.202
0.003 0.064
0.043 0.132

Note. Bias and MAE refer to the mean Bias and MAE for each parameter in the third column, respectively.

Table 3: The results of parameters recovery for nonignorable nonresponses ().

Table 1 presents the results of parameter recovery under both the proposed method and listwise deletion for the ignorable missingness (). The results show that the parameter recovery under the two methods are nearly similar for , and . That is, under the two methods, the item parameters and are well recovered, as their biases are close to 0 and MAEs are around 0.1. Relatively higher MAE for the person ability parameter is present. And the bias of is very close to 0. For the other parameters in the proposed method, they are almost well recovered, except the person missingness parameter . One explanation might simply be that the dimensions of person parameters and are much higher than the item parameters.

Table 2 and Table 3 show the results of parameters recovery under little () and large () nonignorable missingness, respectively. Generally speaking, the recovery of parameters under the proposed method is better than applying listwise deletion. At the same time, as the missing proportion is higher and nonignorability is stronger, the superiority is more obvious. In details, for the proposed method, the item parameters and are always well recovered across all 5 () 2 () =10 conditions as the bias is very close to 0 and the MAE is no more than 0.2. Note that for the fixed , the MAE of decreases with the increasing of missing proportions. That is, recovers better when the missing responses is more, which also confirms that the missingness could be attributed to latent person missing trait.

4.2 Simulation II

Besides evaluating the parameter recovery, we are also interested in assessing whether the missingness is ignorable or nonignorable. Simulation II was conducted to assess whether the missingness is ignorable or nonignorable.

4.2.1 Design

The settings of true model parameters are the same as in Simulation I, except parameter . More specifically, is only to confirm whether the missingness is nonignorable or ignorable in this simulation. That is, in this simulation, the nonignorable () and ignorable () models were employed to fit the data generate from the nonignorable model ().

4.2.2 Criteria

Within Bayesian framework, there are some commonly used criteria for model selection. In this section, these criteria were only applied to the distribution of , because we only focus on the missing data mechanism.

One criterion to evaluate the model fit is deviance information criterion (DIC; Spiegelhalter et al., 2002). This criterion takes into account the trade-off relationship between the adequacy of model fitting and the number of model parameters. The model with a smaller DIC value fits the data better.

The other criterion to compare the two models in terms of fitting is the logarithm of the pseudo marginal likelihood (LPML; Geisser and Eddy, 1979; Ibrahim et al., 2001). The model with a larger LPML has a better fit of the data.

The detailed computational processes of the two criteria would be presented in Appendix C.

4.2.3 Results

The DIC and LPML difference between the nonignorable and ignorable model were calculated and presented by boxplots in Figure 2.

The boxes of DIC difference between nonignorable and ignorable model are always below 0, indicating that the nonignorable model fits the data better. Meanwhile, the LPML differences between nonignorable and ignorable model are always more than 0, which shows the same conclusion as DIC. Furthermore, the DIC and LPML difference increase with the proportions of missing responses. So the nonignorable model has more distinct advantages in terms of model selection when the missing proportion is higher.

Figure 2: Boxplots of DIC and LPML difference between nonignorable and ignorable model.

5 Analysis of the PISA data

The PISA is an international education assessment that measures students’ skills and knowledge in science, mathematics, reading, and so on. Its data set is available and free on

In this section, PISA data set was employed to illustrate the detailed use of the proposed method and further interpret the parameters.

5.1 Data set

In this study, the data set is chosen from a science subtest in the 2015 computer-based PISA in Dominican Republic. The invalid and not applicable samples are excluded from the data set. The valid sample size is 493, in which 173 individuals reached all 17 items. The overall missing proportion of the dataset is ( omitted items and not-reached items). Only DS465Q01C is scored polytomously with 0 (no credit), 1 (partial credit), and 2 (full credit) scores. As the proposed method is focused on dichotomous responses, only full credit is treated as a correct response, while the other two score categories are treated as incorrect responses.

5.2 Analysis

The nonignorable and ignorable models under the proposed method were used to fit the data. DIC and LPML of under the both models were computed for the purpose of model selection. The DICs under nonignorable and ignorable model are 5504 and 5857, respectively. And the LPMLs under nonignorable and ignorable model are -2895 and -3016, respectively. Judging by the both criteria, nonignorable model was selected. So the following analysis based on the proposed method refers to the nonignorable models.

Three Markov chains started at over dispersed starting values were used and each chain had 20000 iterations. The Gelman-Rubin convergence statistic (Brooks and Gelman, 1998) was computed to assess the convergences of all parameters in the proposed model. Convergence is evaluated by the value of . That is, if the is less than 1.1, the parameter achieves convergence. The values of can be obtained based on the “coda” R package (Plummer et al., 2006). The trace plots of for all parameters in the proposed method are presented in Figure 3. It suggests that is generally less than 1.1 after 5,000 iterations for each parameter, indicating the perfect convergence of all the model parameters.

Figure 3: The trace plots of . Note. The dashed line is 1.1.

Similar to Simulation I, listwise deletion was also employed to get the comparative analysis of parameter estimate. That is, 173 samples with complete responses were used to get the parameter estimates based on traditional IRT models.

5.3 Results

Table 5 summarized the missing proportions of each item and the results of item parameters estimation based on the proposed method. For DS465Q01C and DS438Q03C, their EAP of is much higher than the other items, which means the two items are more likely to be omitted. For the last three items, the proportions of missing are higher, despite their are estimated to be negative. One possible explanation is that the missing data mechanism for the three items is mainly the effect that previous missing items bring on current item, so the missingness for these items may be largely not-reached. In general, the items with higher or at the end of test tend to miss.

DS465Q01C 0.296 1.078 0.003 2.302 0.004 0.784 0.001
CS465Q02S 0.142 0.329 0.001 0.743 0.004 -0.023 0.001
CS465Q04S 0.123 0.177 0.000 3.150 0.008 -0.225 0.001
DS131Q02C 0.203 0.730 0.002 2.615 0.005 0.215 0.001
DS131Q04C 0.237 1.099 0.003 1.720 0.003 0.333 0.001
CS428Q01S 0.097 0.867 0.002 1.182 0.002 -0.748 0.002
CS428Q03S 0.102 0.827 0.002 1.102 0.002 -0.741 0.002
DS428Q05C 0.275 1.218 0.004 2.243 0.004 0.418 0.001
DS514Q02C 0.247 0.781 0.002 0.019 0.002 0.191 0.001
DS514Q03C 0.169 0.784 0.002 0.584 0.002 -0.355 0.001
DS514Q04C 0.298 1.535 0.003 1.689 0.002 0.342 0.001
CS438Q01S 0.197 0.711 0.002 0.584 0.002 -0.355 0.001
CS438Q02S 0.239 0.558 0.001 0.872 0.003 -0.125 0.001
DS438Q03C 0.459 1.045 0.003 2.347 0.004 0.965 0.001
CS415Q07S 0.247 0.503 0.001 0.087 0.002 -0.277 0.001
CS415Q02S 0.279 0.709 0.002 0.243 0.002 -0.123 0.001
CS415Q08S 0.275 0.546 0.002 1.344 0.004 -0.228 0.005

Note. refers to the missing proportion for each item. =expected a posteriori,

=standard error.

Table 5: Item parameters estimation for the PISA subtest data based on listwise deletion.
DS465Q01C 1.624 0.006 1.757 0.003
CS465Q02S 0.500 0.002 0.368 0.004
CS465Q04S 0.259 0.001 2.165 0.008
DS131Q02C 0.904 0.003 1.964 0.005
DS131Q04C 1.477 0.005 1.186 0.002
CS428Q01S 1.118 0.003 0.840 0.002
CS428Q03S 0.834 0.003 0.739 0.003
DS428Q05C 1.298 0.005 1.755 0.004
DS514Q02C 0.627 0.002 -0.570 0.003
DS514Q03C 0.801 0.003 1.763 0.005
DS514Q04C 1.629 0.005 1.203 0.002
CS438Q01S 0.729 0.002 -0.246 0.002
CS438Q02S 0.771 0.002 0.469 0.002
DS438Q03C 1.224 0.004 1.950 0.005
CS415Q07S 0.427 0.002 -0.526 0.004
CS415Q02S 0.694 0.002 -0.116 0.002
CS415Q08S 0.465 0.002 0.962 0.005

Note. =expected a posteriori, =standard error.

Table 4: Item parameters estimation for the PISA subtest data based on the proposed method.

The results of item parameter based on listwise deletion are presented in Table 5. Obviously, for the same parameter, there is a big difference for the two methods. Specifically, the EAPs of item difficulty parameters in Table 5 are always higher that in Table 5. A likely cause is that the missing responses often occur for the examinees with low ability and their responses are deleted in the listwise deletion leading to the underestimates of item difficulty parameters.

The posterior histograms of person parameters in the proposed method are shown in Figure 5, and the posterior histogram of ability parameters based on listwise deletion is presented in Figure 5. For the posterior histograms of ability parameters, there also exits a big difference between the two figures. The posterior histograms of ability parameters in Figure 5 is sharper than that in Figure 5, indicating the variance of the former is smaller.

Figure 4: The posterior histograms of person parameters based on the proposed method.
Figure 5: The posterior histograms of person parameters based on listwise deletion.

The results of parameter estimation for other parameters in the proposed model are present in Table 6. and are estimated at 0.405 and 1.000, respectively. So the correlation between and is about 0.405, which is significantly more than 0. This also confirms that the missingness is nonignorable, which is consistent with the result of Bayesian model selection. And the EAP of is 0.204 and more than 0 obviously, showing that the term in Equation (4) is reasonable and necessary.

0.405 1.000 -1.53 0.204 -0.034
0.001 0.002 0.001 0.000 0.000

Note. =expected a posteriori, =standard error.

Table 6: The EAP and SE of the other parameters based on the proposed method.

6 Discussion

This paper proposed an IRT-based method to model omitted and not-reached items based on SLM. It was proved that the item parameter estimate based on MML estimation is consistent. Further more, MCMC methods was given to estimate all the model parameters based on Probit link. Bayesian model selection for nonignorable and ignorable model was explored using DIC and LPML criteria. The usage and the performance of the proposed method were demonstrated by using the 2015 PISA computer-based science subtest data as an example. The real data analysis indicated that the missingness is nonignorable, as the correlation between and is significantly greater than 0. And both the DIC and LPML also gave the same conclusion.

The proposed method could handle both ignorable and nonignorable missingness by adjusting the correlation between person missing parameter and ability. If the correlation is more than 0, in Equation (4) depends on the ability , so that the missingness is nonignorable (NMAR). If the correlation is 0, only depends on the observed response and , as well as cumulative number of missing response. Therefore, the missingness is ignorable.

Despite such promising results, other issues should be investigated in the future. The first one is whether the proposed method could be applied to estimate more complicated IRT models, such as the partial credit model (Masters, 1982), the generalized partial credit model (Muraki, 1992). Second, if the last item at the end of test is missed and the penultimate one is observed, the last nonresponse may occur due to skipped item or the time limit. It is still unknown how to clarify whether the last nonresponse is omitted or not-reached. In this case, maybe response time could be taken into account to get further information. Finally, the missingness not only occurs in item response data but also response time data for computer based test. It is still a promising issue to extend the proposed method to the mixture model for response times and response accuracy (Wang and Xu, 2015).


This work is supported by the National Natural Science Foundation of China (grant number 11571069).

Appendix A: Proof of theorems

Proof of Theorem 1

Since the items are assumed to be independent in context of IRT, we only prove the consistency results for the fixed item. For notational convenience, the item subscript was omitted in the following proof. Moreover, define as the item parameters vector for the item and as its true value. Actually, the parameters space can be written as . It is easy to verify that the MML estimation of is not at the bound of . So we only consider the consistency on its closed subset , where is sufficient small and are sufficient large.

Let and be its item response for examinee , for , where is the imputed value in the proposed method. Define

where .

By Jensen’s inequality,

Using the notations in Assumption 1, therefore,

Further, by Assumption 1,

Accordingly, the following uniform law of large numbers holds:

where denotes convergence in probability. Moreover, by Assumption 2 and Gibbs’ inequality, has a unique maximum at the true parameter . Further, by the continuity of with respect to , we have .

Proof of Theorem 2

Since the derivative of with respect to at is and then by Taylor’s expansion, we have that

where means the remainder converges to in probability and

Then, by the multidimensional central limit theorem, we have

where denote convergence in distribution and

is the Fisher information matrix of at . At the same time, by the weak law of large numbers,

Further, by the Continuous Mapping theorem and Slutsky’s theorem, we have

Appendix B: The detailed sampling procedure

The detailed sampling procedure using MCMC can be broken down into the following steps:

Step 1: Sample the augmented variable from the truncated normal distribution

Step 2: Similarly, sample an additional augmented variable as follow:

Step 3: Sample the latent ability parameter for examinee . As previously mentioned, . So the prior distribution of is the conditional normal distribution

where and And the full conditional posterior distribution of is


Step 4: Sample the discrimination parameter for item . A prior for is a truncated normal distribution with mean and variance That is, Therefore, the full conditional posterior distribution of is

where .

Step 5: Sample the difficulty parameter for item . A prior for is a normal distribution with mean and variance That is, Therefore, the full conditional posterior distribution of is

where .

Step 6: Sample the missing response from Bernoulli, where was defined in Equation (7).

Step 7: Sample the person missing parameter . Similar to , the conditional prior distribution of is

where and The full conditional posterior distribution of is a normal distribution

where .

Step 8: Sample the item missing parameter in the missing mechanism model. A prior for is a normal distribution Therefore, the full conditional posterior distribution of is a normal distribution


Step 9: Sample the intercept parameter in the missing mechanism model. A prior for is a truncated normal distribution with mean and variance That is,