1 Introduction
In this article, we investigate a pseudomarginal MetropolisHastings algorithm for handling missing data in the context of logistic regression. Missing data is a common problem in many applications, and while it is common to exclude cases with missing covariates, this approach can significantly bias resulting analyses. Missing data has received extensive attention in the statistical literature, while the EMalgorithm and multiple imputation are common approaches in practice
[15]. In this paper, we explore an alternative MetropolisHastings algorithm, which has the following advantages over the aforementioned competitors.First, there are far fewer assumptions that are required to be able to implement this algorithm. For example, there are no assumptions that the data must be missing at random (MAR) or missing completely at random (MCAR), there are no assumptions that the data must possess a monotone missingness pattern, the distribution of the covariates does not have to possess a certain functional form, and the choice of the prior is unrestricted.
Second, it provides joint inferences on all parameters. We group together the parameters of these models into three groups: the parameters for the conditional likelihood (these are the regression coefficients that are of primary interest to us, usually), the parameters of the distribution of the covariates that are missing, and the parameters that govern the missingessmechanism. This procedure yields samples for all three groups.
Third, there is no logical inconsistency of assuming there are two different posterior distributions. Multiple imputation procedures aim to draw from the posterior predictive distribution of the model, but often do this in a way that does not take into account the specific structure of a given problem. This is often practical, because the imputation task and the completedata modeling task can be done separately. With our approach, imputing and modeling can be done separately in some sense; however, only one algorithm brings to bear the effort of these two parties.
There is a substantial computational cost to gaining these benefits, however. For nontrivial data sets, every MCMC iteration might require a very large number of simulations. This difficulty necessitates careful programming, and puts limits on the kinds of data sets to which this methodology can be feasibly applied. We address this issue in later sections and detail some strategies that helped with our specific modeling tasks. While this method is developed fully in the context of logistic regression, extension to other models is immediate. Our work was motivated by an analysis of the field data risks and benefits associated with knee airbag deployment for occupants involved in frontal car crashes; a subanalysis of this data is developed as an application.
2 Logistic Regression and Missing Data
2.1 Our Model’s Assumptions
Let
be the vector of completely observed binary responses for units
, the matrix of the covariate data where each is the vector of covariate data for unit , and let be the vector of regression coefficients.Again, we assume that all response variables are observed, but that there are missing data in the covariates. Denote by
and . Also denote and .Having access to all of the covariate data would allow one to evaluate the complete conditional likelihood
(1) 
We also specify the distribution of the covariate data that are missing. We assume that all rows are independent, and that for each row, the distribution of the unobserved data in column , parameterized by a vector , depends on columns through :
(2) 
To simplify notation, we follow the convention where if , . This structure follows along the lines of [12]. Note that the distribution of a missing observation may depend on both missing and observed data.
Finally, we will make use of the inclusion indicator, an matrix whose element is if is observed, and
otherwise. The missingdata mechanism is the conditional probability distribution of this indicator matrix:
(3) 
It is common to assume that this distribution is free of either the missing data, or free of both missing and observed data. These assumptions are termed missing at random and missing completely at random, and they are indispensable in showing that the missingdata mechanism is ignorable, or in other words, that samples from the posterior can be obtained ignoring missing values and the places in which those values are missing [7]. There is no such assumption made in this paper, however, because this is not a requirement of the described methodology.
3 Markov Chain Monte Carlo Approaches
After we have chosen a prior distribution for our parameters, call it , we aim to draw samples from the marginal posterior . This is the distribution of everything we are interested in, conditioning on everything we know. Because we are interested in being able to sample from this distribution without restricting any choice of a prior distribution or a missingdata mechanism, we will use a variant of the MetropolisHastings algorithm [9], [7]. This article will accomplish this by taking the approach of sampling from a joint posterior defined on a space of much larger dimension, and integrating out nuisance parameters.
3.1 Initial Approaches
First, one may look at the highdimensional posterior of everything that is unknown: , which is proportional to
It is theoretically possible to use the MetropolisHastings algorithm to sample the parameters as well as the missing data all at once, because this function above is proportional to the target density; however, practically speaking, this approach might be infeasible if there are too many missing data points.
A second approach would avoid sampling on this unnecessarily large space by using the “marginal” MH algorithm, which targets
(4) 
This approach would solve the problem of dimensionality if it was possible to evaluate the “observeddata likelihood” in the right hand side of Equation 4. Indeed, if one used the MetropolisHastings algorithm [9], one would simulate a Markov chain which leaves the normalized version of that expression invariant. Given that we are at in the chain, we first propose , and accept this proposal with probability
(5) 
Typically the dimensions of , , and are small or moderate, so this would be a relatively ideal situation. However, it is rare to be able to evaluate the observeddata likelihood because it is often a highdimensional integral, and one would rather not make overlyrestrictive assumptions about the missingdata mechanism [12].
3.2 A PseudoMarginal MetropolisHastings Algorithm
The pseudomarginal algorithm [3], [1] resembles the above algorithm; however, it replaces the intractable quantity with an estimate that is nonnegative and unbiased. The algorithm described above remains the same, with the exception that the acceptance probability at some iteration becomes
(6) 
where
is an unbiased estimate of
Despite (4) not being equal to (5), the sampler targets the left hand side of (4) exactly.With this particular model, we estimate the likelihood at each iteration using importance sampling [13]. First, choose an importance distribution for the missing data, call it . This distribution must be positive whenever is also positive. Then choose a number of samples , and for , draw and evaluate
(7)  
By the strong law of large numbers, this converges to the true marginal likelihood as
. More importantly, it is unbiased, which is the primary requirement of the pseudomarginal approach. The entire algorithm is described in Algorithm 1.The question of how to choose the three tuning elements of this algorithm () is still an open one. When choosing for use in a more classical algorithm, wherein likelihood evaluations are available, there are recommendations for target acceptance rates, as well as recommendations for how to choose the proposal’s parameters [7]. Regarding the importance sampling importance distribution,
, the asymptotic variance of its estimator, when it exists, will be large when the ratio between the importance distribution’s density and the density of the target is far from unity for
likely values. In particular, it is problematic when the importance density has thinner tails than the target, when the target is relatively peaked, or when the target is defined on a highdimensional space [16]. The last difficulty is especially problematic for this technique. For a fixed percentage of the data that is missing, there will always be a sufficiently large size of data that renders this approach infeasible.The dispersion of the estimates of the loglikelihood can often be the deciding factor when it comes to determining whether a chain mixes well or not. [2] show that increasing not only decreases the variance of the importance sampling estimate, but that it lowers the asymptotic variance of parmeter estimates taken from the pseudomarginal chain. However, there are computational limits on how far can be increased. Recently, much work has provided guidelines for choosing . [5] and [22] describe guidelines for choosing in order to minimize computing time criteria. These papers both suggest choosing a value of that makes the variance of the loglikelihood between and .
Increasing the precision of your likelihood estimates without resorting to increasing is obviously extremely useful as well; here knowledge of classical variancereduction techniques is indispensable. [4] describe how to correlate the estimates of the loglikelihood at consecutive iterations in order to minimize the variance of their difference.
In practice, tuning the estimation algorithm starts with choosing an importance sampling strategy first, then choosing second. The importance sampling strategy will often be close to the best a user can come up with. Afterwards, several s are tried. One might compute many loglikelihood estimates, and increase until the sample variance is below , say. This is far more convenient than running “fulllength” simulations that are increasingly more computationally expensive; however, for nontrivial problems, this decision rule might suggest values of that are too high to be practically useful. For example, if this strategy suggests an that is so large that one would only be able to come up with a few hundred samples, it’s not clear that one should follow this rule as it is unlikely that the chain will have enough time to burnin or converge to stationarity.
TODO: edit this The current state is far from totally determined, and because the specifics of our problem do not satisfy all of the assumptions used in the abovementioned work, we choose to take a more applied approach. We focus on choosing , and then select the other tuning parameters based on several “pilot runs” of the algorithm. We also follow along the lines of (don’t cite this) and parallelize the computation of each iteration’s importance sampling estimator.
3.3 A Comparison with Multiple Imputation
Multiple Imputation (MI) generates multiple complete data sets by sampling several sets of plausible values for each missing data point by sampling from the posterior predictive distribution [19], [20], [7]
. The same analysis is performed separately on each data set, and the results are then combined. For example, in the context of regression analysis, the model parameters derived from each imputed dataset are combined by a simple average. The parameter variances are calculated by averaging the individual variances from each imputation, and the formula includes an additional term to capture the betweenimputation variance.
[14] compare the two implementations most commonly used in practice: Multivariate Normal Imputation (MVNI) and Multivariate Imputation by Chained Equations (MICE). MVNI was proposed by [21], and is implemented in software packages such as [11]
. It assumes the data are normally distributed, and that the missing values are missing at random. MICE, on the other hand, also known as the Full Conditional Specification, is more flexible. It does not assume that the data are normally distributed, and so it is more capable of handling binary and ordinal data. Several statistical software packages implement this procedure
[23], [24]. First, initial estimates for the missing values are drawn from the existing data, and then, columns of data are sampled sequentially. If the missing pattern is monotone, one sweep through all of the column is required. Otherwise, Gibbs sampling is performed, which alternates between draws of the conditional distributions until a convergence criterion is reached [18].MI is popular because it divides the labor of data analysis between the data collector and the data analyst. The data collector may impute multiple data sets on his own, taking into account his knowledge of the data collection process, and then send these on to the data analyst, who performs the same analysis on each imputed data set [15].
However, despite its convenience, MI has some obvious drawbacks. First, it is logically inconsistent to assume that there are two posterior distributions, one used for imputing data, and one used for performing the final analysis. It is hard to say how this affects how inaccurate the resulting estimates will be. Second, the algorithm is potentially more inefficient. If one is already drawing parameters along with missing data values, why throw away those parameters before drawing them again? Currently, the second approach is not likely to “stick,” as the pseudomarginal method is notoriously computationally expensive, as we will see in the applied sections that follow.
4 A Simulation Study
We perform a simulation study to demonstrate the algorithm’s ability to recover the true parameters of the model specified by Equations 1, 2, and 3. We show the performance of the algorithm for different tuning parameters, and discuss some common difficulties of implementation. All of the code is provided by the authors; more details can be found in the appendix.
4.1 The Data
Our simulated data set has two covariate columns: and . The first is fullyobserved, and the second has 34% of its rows missing. To generate the data, we assume , and . So, in particular, if is missing,
If is not missing, then the left hand side of the above display is .
We keep the conditional likelihood as Equation 1, and we use a similar form to specialize Equation 3. More specifically, we assume the missingdata mechanism is
where equals
if , otherwise the left hand side of the above display is . Here we are using IL
to refer to the “inverse logit” or “logistic sigmoid” function.
Familiar distributions are chosen for the priors. We let be a distribution, is chosen to be a , and is made to be a . These priors were selected by a hypothetical analyst, and were not used to simulate the the datagenerating parameters. Rather, the “true” datagenerating parameters are , , and .
4.2 The PMMH Approach
There are two proposal distributions that we must choose: the importance sampling proposal distribution , and the MetropolisHastings proposal distribution . For the latter, we transform the parameters to be realvalued, and then propose new values using a random walk on this transformed space. The regression coefficients in the conditional likelihood and the missingdata mechanism are already realvalued, so only will be transformed. It will be transformed into , where denotes the base logarithm.
Regarding the importance sampling proposal distribution, we choose
where , which is a scaled distribution with degrees of freedom.
iterations were performed, and at each iteration, Equation 6 is evaluated using . After discarding samples for burnin, estimates,
% credible intervals, and
convergence diagnostics [7] are calculated for every element of (this was done by splitting the single chain into two parts after discarding iterations as burnin). This information is provided in Table 1. Histograms of each parameter’s samples are provided in the appendix in Figure 2. Do note that the samples for are not completely satisfactory. This is reflected in the histogram spike at , and in the value that is slightly high.param.  truth  estimate  % cred.  

1  1.43  (0.31, 3.78)  1.02  
1  1.35  (0.91, 1.78)  1.01  
2  2.44  (3.94, 1.35)  1.02  
3  3.88  (2.35, 5.58)  1.08  
1  1.06  (0.51, 1.8)  1.15  
1  1.08  (0.4, 1.74)  1.01  
1  0.76  (0.15, 1.38)  1.29 
4.3 Using MICE
The same data was analyzed using MICE. We made use of the mice
package in R
[24]
. Ten data sets were imputed using the method of Bayesian linear regression, using both
andvalues as predictors. Only estimates for the regression coefficients are available. Improvized confidence intervals are created by adding and subtracting
standard errors from each parameter estimate. This information is given below in Table 2.param.  truth  estimate  % interval 

1  0.95  (0.11, 2.01)  
2  2.00  (3.58, 0.42)  
3  3.40  (0.96, 5.83) 
It is problematic to compare credible intervals with confidence intervals, but a few things are worth pointing out. One will notice that the PMMH method yields narrower intervals. Moreover, these intervals were constructed in a less ambiguous way. They are calculated by taking empirical quantiles of the samples, so they take into account the shape of each marginal posterior better than adding and subtracting a single quantity from a point estimate. MICE produced point estimates that happen to be closer to the datagenerating values. This could be just a coincidence. After all, MICE did not take into account our known missingnessmechanism.
Regarding the cost of computation, it is unfair to compare packaged code with a pure R
implementation, so we do not provide any runtime measurements. Two things are certain, though. First, PMMH is much more computationally demanding than MICE, and it is unlikely that further refining the PMMH code would close this time gap completely. This particular script takes around
hours to run, whereas with MICE, it takes just a brief moment. Second, speeding up PMMH in
R
would be a very fruitful computational undertaking. Compiled languages would be useful for their overall quickness. In particular, it would be useful to expose a framework that uses passbyreference semantics to facilitate the use of common random numbers. After this has been completed, more involved simulation experiments would be very informative.
The critical aspect of the pseudomarginal algorithm is the dispersion of the quantity in Equation 7. We demonstrate this with the following plots, which are something akin to a “profile” surface. The calculation of used in Algorithm 1 takes as inputs of all the parameters, as well as replications of (despite its notation not clearly reflecting that). In an effort to plot , we set all other parameters equal to the datagenerating values, and we simulate missing data sets. For each sample size, two surface plots are provided, each of which use different replications of .
Most noticeably, a lower number of importance samples produces a rougher surface. This means that, even if a “likely” parameter vector is proposed by , it could still fall on a spike and thus be rejected with virtual certainty. Moreover, it is hard to control the location of these spikes. When comparing the plots in a given column, it can be seen that the location of these spikes changes, and thus only depends on the missing data that is simulated after parameters have been proposed.
It is unsurprising that seems to be less identified than , but it is interesting to note that spikes are less severe near values of . When is near , the importance weights found in (7) are nearly constant for most sampled data values. In short, approximating the acceptance probability is much easier for certain areas of the parameter space. Taken together, these plots suggest that raising the sample size will increase smoothness more in some areas and less in others, and that the acceptance rate of this algorithm will not necessarily be improved by tuning the MH proposal .
5 Analysis of a Real Data Set
5.1 Background
This work was originally motivated by an analysis of data collected by the National Automotive Sampling System, Crashworthiness Data System (NASSCDS) to understand the potential benefits and risks associated with knee airbag (KAB) deployment for belted occupants in realworld frontal car crashes. NASSCDS was a survey sample of approximately 5,000 towaway crashes in the United States each year, conducted by the National Highway Traffic Safety Administration (NHTSA) [10]. Each sampled crash received a thorough investigation which included crash reconstruction, demographics of all involved occupants, vehicle characteristics (year, make, model, etc) of all involved vehicles, thorough documentation of vehicle damage including photos, and documentation of all occupant injuries using the Abbreviated Injury Scale (AIS) [8].
Injury risk in frontal crashes depends significantly on the vehicle’s change in velocity (deltaV), which is normalized to be “barrier impact equivalent” and is estimated as part of the crash reconstruction. Additional significant predictors are occupant age and sex. Vehicles tend to get modestly safer with each additional model year. Factors such as direction of force (measured in in 10 increments), occupant height and weight (or body mass index), and vehicle type are considered to be important covariates, but are not consistently associated with increases or decreases in risk in field data analyses of frontal crashes [6].
Missing data is a significant problem for all analyses using NASSCDS. Approximately 30% of all frontal crashes do not have an estimated deltaV, and a different 30% are missing occupant height and weight. The data also contains auxiliary information, such as vehicle crush measurements, which in some cases can inform imputation. Data does not appear to be missing completely at random. For example, crashes with missing deltaV seem to have a higher proportion of uninjured occupants. Nonetheless, it is desirable to include all cases in order to both minimize bias from excluded cases and to retain all occupants who experienced KAB deployment; KABs have recently become more common in the vehicle fleet, but at this point, only relatively few crashes of KAB equipped vehicles have been investigated.
For exposition, in this article, we investigate the probability of the occupant receiving a serious or worse injury (AIS 3 or higher) in any body region. The primary predictor of interest is knee airbag deployment, and we wish to adjust for the covariates indicated above while including wellestablished prior information about effect sizes, as well as provide realistic models for the missing covariates.
5.2 Results
The following variables are used in our model as predictors: age (in years), sex, body mass index (BMI), the sum of two deltaV measurements (dvtotal
), whether the knee airbag was deployed, whether the vehicle was a sport utility vehicle (SUV), whether the vehicle was a truck, whether the vehicle was a van, model year (centered at 2003), and a categorical direction of force variable. Among these, the variables that exhibit some missingness are sex, BMI, dvtotal
, and model year (see Table 3). To help with sampling missing data, we make use of a variable dvc
, which is a sum of several measurements of how much a vehicle was crushed during impact. Interestingly, some of this column is missing as well, so it too must have its distribution specified.
variable  % miss 

sex 
0.030 
bmi 
13.050 
dvtotal 
29.023 
modelyr 
0.004 
dvc 
20.241 
To specify the covariates’ distributions and the parameter priors, we make use of considerable subjectmatter expertise. These 23 distributions are listed in Appendix A
, and are briefly described here. All of the regression coefficients were given normal priors, and the parameters of these distributions were carefully chosen to cohere with the knowledge obtained from similar studies in the past. The missing data distributions were chosen so that they would generate data that looks similar to what is observed. In particular, the empirical support of each distribution was first noted. For example, the dummy variable for sex was given a Bernoulli distribution, and the missing crush measurements, because their observed values only took on integers, were given a Negative Binomial distribution. The data for BMI was positively skewed, so we gave its missing data a Skew Normal distribution. The priors were informative wherever appropriate. Locationscale families were elicited directly, while Beta and InverseGamma distributions were elicited from their prior means and quantiles.
We also chose five distributions to help propose missing covariates to be used in calculating our importance sampling estimates. We generally tried to choose these distributions so that they would be visually similar to the empirical distributions of each covariate. Wherever possible, we give these distributions tails that are fatter than the missing data distributions, which is in accordance with a general principle of importance sampling. For example, BMI and model year were given Scaled distributions. For more details, see Appendix A.
Only iterations were performed, each using an importance sampler with samples. The program took slightly less than six days on a machine with an Intel® Core™ i74770 CPU @ 3.40GHz × 8 processor. The program was written entirely in R
, and, using the parallel
package [17], parallelization was employed to speed up the evaluation of the importance sampling estimator at each iteration.
Table 4 shows the results for estimating after discarding a thousand samples for burnin. For comparison, the table provides the parameter estimates using a standard maximum likelihood estimation routine under the complete case analysis. The two procedures seem to yield estimates that are similar for the most part; however, there are some interesting differences to take note of. For example, our method’s coefficient estimate for the dummy variable indicating whether the knee airbag was deployed has the opposite sign. Second, the magnitude of the effect size for sex seems to have increased.
These results should be taken with a grain of salt, however, because the diagnostics are relatively high. The interpretation of this diagnostic is the factor by which the scale of the current distribution for a particular univariate parameter “might be reduced if the simulations were continued in the limit ” [7]. A common rule of thumb is to continue simulations until all of these values are less than . More iterations could have been obtained if it were not for the expense of the computations. Clearly, there is a need for more work on the computations to be done here. Table 5 in the appendix shows the parameter estimates for and , along with the same uncertainty quantification used before.
Est.  MCSE  95.cred.lower  95.cred.upper  RHat  MLE  

intercept 
9.175  0.026  9.716  8.651  1.768  7.592 
age 
0.054  0.002  0.364  0.161  1.069  0.040 
sex 
2.963  0.023  3.263  1.372  1.209  0.366 
bmi 
0.054  0.004  0.233  0.201  1.077  0.019 
dvtotal 
0.102  0.004  0.166  0.351  1.193  0.082 
kabdeply 
0.891  0.048  0.342  2.230  1.538  0.692 
suv 
0.291  0.014  0.495  1.087  1.013  0.033 
truck 
1.685  0.013  2.049  1.342  1.127  0.274 
van 
0.180  0.012  1.214  0.173  1.003  0.136 
modelyr 
0.129  0.027  1.753  0.147  1.115  0.051 
pdofgrFar 
2.147  0.023  2.611  1.603  1.627  0.098 
pdofgrNear 
1.758  0.039  0.493  2.301  1.969  0.277 
6 Conclusion
This paper has demonstrated the use of the pseudomarginal MetropolisHastings algorithm to estimate generalized linear models in the presence of missing data in a fully Bayesian way, and it has mentioned some of the benefits this procedure possesses when being compared to other commonly used methods. However, this paper has also demonstrated that the feasibility of this approach is diminished when it is employed on data sets with many missing values, and so there is much work to be done to reduce the computational cost of this class of techniques. In our opinion, computational complexity is the primary impediment, and if future work reduces the computation time that is required, this approach could prove to be a promising tool that would enjoy a more widespread adoption.
Appendix A Appendix
The appendix is divided into two sections: one for the results of the simulation study, and one for the results of the realdata analysis.
a.1 Simulation Study
a.2 RealData Analysis
Recall that the columns with missing values are sex, body mass index, dvtotal, and model year. DVC has missing data as well. Even though it’s not a predictor for crash outcome, it must be imputed because the imputations of other variables rely on it. Referring to Equation 2, we specify the model for the missing covariates as follows. Note that we remove from the notation the dependence on row number.

,




So, connecting this new notation back with our old, the parameters used in the distributions of the covariates would be written out as .
To specify the priors, we assume all elements of and are independent a priori. This requires us only to specify marginal distributions for each parameter. The priors for the elements of are as follows:











.
The priors for the elements of are as follows:












.
Regarding the importance sampling proposal distribution, we choose
where
Note we are using the NegativeBinomial parameterization that is used in base R
’s dnbinom
function: .
Est.  MCSE  95.cred.lower  95.cred.upper  RHat  

0.925  0.001  0.901  0.957  1.468  
24.172  0.015  23.589  24.697  1.001  
6.770  0.130  5.532  11.652  1.443  
0.851  0.027  0.540  2.767  1.325  
12.734  0.027  11.233  13.287  1.001  
0.048  0.002  0.006  0.084  1.007  
126.924  2.717  0.002  168.090  1.764  
0.169  0.024  0.549  0.523  1.956  
198.008  4.291  72.836  304.664  1.344  
0.394  0.011  0.149  0.554  1.070  
0.002  0.000  0.001  0.009  1.364  
2.819  0.007  3.129  0.352  1.035 
References
 Andrieu and Roberts [2009] Andrieu, C. and G. O. Roberts (2009, 04). The pseudomarginal approach for efficient monte carlo computations. Ann. Statist. 37(2), 697–725.
 Andrieu and Vihola [2016] Andrieu, C. and M. Vihola (2016, 10). Establishing some order amongst exact approximations of mcmcs. Ann. Appl. Probab. 26(5), 2661–2696.
 Beaumont [2003] Beaumont, M. A. (2003). Estimation of population growth or decline in genetically monitored populations. Genetics 164(3), 1139–1160.
 Deligiannidis et al. [2015] Deligiannidis, G., A. Doucet, and M. K. Pitt (2015). The correlated pseudomarginal method.
 Doucet et al. [2015] Doucet, A., M. K. Pitt, G. Deligiannidis, and R. Kohn (2015). Efficient implementation of markov chain monte carlo when using an unbiased likelihood estimator. Biometrika 102(2), 295–313.
 Forman and McMurry [2018] Forman, J. L. and T. L. McMurry (2018). Nonlinear models of injury risk and implications in intervention targeting for thoracic injury mitigation. Traffic Injury Prevention 19(sup2), S103–S108. PMID: 30624079.
 Gelman et al. [2013] Gelman, A., J. Carlin, H. Stern, D. Dunson, A. Vehtari, and D. Rubin (2013). Bayesian Data Analysis. Chapman & Hall/CRC Texts in Statistical Science. CRC Press.
 Gennarelli and Wodzin [2006] Gennarelli, T. A. and E. Wodzin (2006, Dec). ¡em¿ais 2005¡/em¿: A contemporary injury scale. Injury 37(12), 1083–1091.
 Hastings [1970] Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1), 97–109.
 Highway National Traffic Safety Administration [2016] Highway National Traffic Safety Administration (2016). Overview of the crashworthiness data system. [Online; accessed 16May2019].
 Honaker et al. [2011] Honaker, J., G. King, and M. Blackwell (2011). Amelia II: A program for missing data. Journal of Statistical Software 45(7), 1–47.
 Ibrahim et al. [2005] Ibrahim, J. G., M.H. Chen, S. R. Lipsitz, and A. H. Herring (2005). Missingdata methods for generalized linear models. Journal of the American Statistical Association 100(469), 332–346.
 Kahn and Harris [1951] Kahn, H. and T. E. Harris (1951). Estimation of particle transmission by random sampling. National Bureau of Standards Applied Mathematical Series 12, 27–30.
 Lee and Carlin [2010] Lee, K. J. and J. B. Carlin (2010, 01). Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation. American Journal of Epidemiology 171(5), 624–632.
 Little and Rubin [2019] Little, R. J. and D. B. Rubin (2019). Statistical analysis with missing data, Volume 793. Wiley.
 Owen [2013] Owen, A. B. (2013). Monte Carlo theory, methods and examples.
 R Core Team [2019] R Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
 Raghunathan et al. [2001] Raghunathan, T. E., J. M. Lepkowski, J. V. Hoewyk, and P. Solenberger (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. survey methodology 27.
 Rubin [1978] Rubin, D. B. (1978). Multiple imputations in sample surveysa phenomenological bayesian approach to nonresponse. In Proceedings of the survey research methods section of the American Statistical Association, Volume 1, pp. 20–34. American Statistical Association.
 Rubin [1987] Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley.
 Schafer [1997] Schafer, J. (1997). Analysis of Incomplete Multivariate Data. London: Chapman and Hall.
 Sherlock et al. [2015] Sherlock, C., A. H. Thiery, G. O. Roberts, and J. S. Rosenthal (2015, 02). On the efficiency of pseudomarginal random walk metropolis algorithms. Ann. Statist. 43(1), 238–275.
 Su et al. [2011] Su, Y.S., A. Gelman, J. Hill, and M. Yajima (2011). Multiple imputation with diagnostics (mi) in r: Opening windows into the black box. Journal of Statistical Software, Articles 45(2), 1–31.
 van Buuren and GroothuisOudshoorn [2011] van Buuren, S. and K. GroothuisOudshoorn (2011). mice: Multivariate imputation by chained equations in r. Journal of Statistical Software 45(3), 1–67.
Comments
There are no comments yet.