1 Introduction
Multilevel ordinal outcomes commonly appear in observational studies. In a study of dental disease, maximum clinical attachment loss (CAL) – recorded on each tooth using an ordinal scoring system – was used to assess periodontal health. In this context, each subject contributes multiple outcomes of interest (CAL) and they are clustered within a subject. Therefore, the correlation between the outcomes within the same subject need to be accounted for in the analysis. Generalized linear mixed effect model (GLMM) and generalized estimating equation (GEE), both extensions of the generalized linear model (GLM), are the two most popular methods to model such multilevel data. GLMM gives clusterspecific inference while marginal models using GEE give populationaverage inference.
Both GLMM and GEE assume that cluster size and the outcome of interest are independent, given the covariates. However, this assumption can be violated when cluster size varies and changes with the degree of the outcome. For example, in periodontitis, the probability of losing a tooth increases with the severity of the disease. Hence, patients with advanced periodontitis tend to have fewer number of teeth, or smaller cluster size, compared to patients with good oral health. This type of situation, when the outcome is dependent on cluster size conditional on observed covariates, is referred to as informative cluster size (ICS). To handle ICS, Seaman et al. summarized a number of methods based on GLMM and GEE
[seaman2014methods]. Including the cluster size as a covariate in the model is one of the solutions, but it changes the interpretations of coefficients of other covariates. When the interest is in making marginal inference, Williamson et al. and Benhin et al. both proposed cluster weighted GEE (CWGEE), which provides unbiased estimator when ICS exists
[williamson2003marginal, benhin2005cwgee] and can model ordinal outcomes [mitani2019marginal].While CAL cannot be measured on missing teeth (which leads to the issue of ICS), in the dental study, some CAL measurements were missing also on existing teeth for unknown reasons. Removing the teeth with missing CAL values from the analysis will produce inconsistency between cluster size and the number of observed outcomes for some of the subjects. Furthermore, marginal models using GEE assume that missing data are unrelated to observed and unobserved variables. Therefore, when the interest is in making marginal inference, missing data problems are often dealt by multiple imputation (MI) [schafer1997analysis, little2019statistical]
. Briefly, MI involves three phases: imputation phase, analysis phase, and pooling phase. In the imputation phase, the missing values are filled in with plausible values estimated from the posterior predictive model. This process is repeated
times, creating different complete datasets. Then in the analysis phase, statistical analysis is performed on each of the complete datasets, leading toestimates and variancecovariance matrices of the parameters. Finally, in the pooling phase, the
parameter estimates and variances are pooled to create one set of parameter and variance estimates [rubin2004multiple].There are two main strategies for MI: joint modeling (JM) and fully conditional specification (FCS), also known as MI with chained equation (MICE). Schafer proposed a joint multivariate normal model, which assumes that partially observed data follow a multivariate normal distribution
[schafer1997analysis]. FCS is known for its flexibility in handling data with mixed variable types by imputing the variables containing missing values one at a time. Hence, FCS requires a unique specification of the imputation model for each variable with missing values [vanBuuren2018book]. Both strategies have been implemented in mainstream statistical programming languages, including R, and standalone software. R packages that perform MI based on JM include norm [novo2002analysis], cat [harding6cat], mix [schafermix], pan [pan] and jomo [quartagno2019jomo]. There is also a standalone software REALCOM [carpenter2011realcom] that performs JM imputation. However, only REALCOM and jomo are able to impute multilevel categorical data. The R package mice is the most commonly used R package to implement FCS, which provides many options for model specification [van2011mice]. However, mice has limited options to impute multilevel data. Other packages, such as micemd [audigier2019micemd] and miceadds [robitzsch2017package], as extensions for mice, provide more options for different types of variables in multilevel data. Nevertheless, micemd does not deal with ordinal data and miceadds uses predictive mean matching to impute ordinal data. Recently, Enders et al. developed a standalone software blimp that uses a latent probit approach to impute multilevel ordinal data [enders2018fully].Currently, the R package jomo and the standalone software blimp are two optimal software for imputing missing multilevel ordinal data. Although their performances on imputing multilevel continuous and binary data have been compared in many different aspects [enders2016multilevel, audigier2018multiple, wijesuriya2022multiple], their performances on imputing multilevel ordinal outcomes, especially when ICS exists, have not been studied yet. Kombo et al. compared FCS and JM for ordinal longitudinal data with monotone missing data patterns [kombo2017multiple], but many multilevel data, such as clustered dental data, do not follow a monotone missing data pattern.
The objective of this paper is to compare the performances between JM and FCS when imputing clustered ordinal outcomes that are subject to ICS. Two available software packages will be used: jomo in R for JM and blimp
for FCS. For each of the JM and FCS approaches, we will additionally assess whether the inclusion of cluster size in the imputation model will improve parameter estimation of the analysis model. Since we are interested in the populationaverage inference, parameters in the analysis model will be estimated by CWGEE with proportional odds logit. The rest of the paper is organized as follows. In section
2, we describe the methods used for the analysis model and imputation models. Extensive simulation studies as well as simulation results are shown in section 3. In section 4, we apply the imputation methods to a real dental data. Finally, conclusions and discussion are given in section 5.2 Methods
2.1 Motivating example: the VADLS study
Our study is motivated by the Veterans Affairs Dental Longitudinal Study (VADLS), which was a closedpanel longitudinal study that monitored oral health and diseases of male subjects from the greater Boston metropolitan area
[kapur1972veterans]. The health status of the subjects were measured approximately every three years. For illustration, we focused on one cycle of the longitudinal data, which includes 241 subjects with a total of 5100 teeth. We were interested in assessing the association between metabolic syndrome (MetS) and increasing CAL scores [kaye2016metabolic]. CAL score is a level1 (tooth/member level) ordinal variable of four categories (0:
2mm, 1: 22.9mm, 2: 34.9mm, 3: 5mm). The higher score indicates a worse prognosis of periodontal disease. We modelled the association between MetS (yes/no) and ordinal CAL scores using the proportional odds logistic regression adjusted for level2 (subject/cluster level) variables: age, smoking status (eversmoker/neversmoker), and education levels (high school, some college, college graduate). The summary statistics and missing percentages of all variable are shown in Table S1. These variables have been shown to be associated with CAL from previous studies [kaye2016metabolic, gamonal2010clinical]. The marginal analysis model had the following form:(1)  
where is CAL recorded on the th tooth of the th subject, , . Two issues existed in producing valid inference from Equation ((1)). First, the number of teeth per subject ranged from 1 to 28 and the Spearman correlation coefficient of mean CAL score per subject and the number of teeth per subject was 0.41 (95% CI: 0.44, 0.38), indicating the presence of ICS in this data. Therefore, CWGEE was applied for estimation. Second, CAL was missing in 19% of all existing teeth. Hence, MI was applied to make use of all available data before model fitting. In addition to CAL, three other level1 variables that measure the prognosis of periodontitis were available: probing pocket depth (PPD), alveolar bone loss (ABL) and mobility (Mobil). PPD, ABL, and Mobil were also recorded using an ordinal scoring system and were correlated with each other and with CAL. Although PPD, ABL, and Mobil were not included in the analysis model, they were included in the imputation model to help impute missing CAL.
2.2 Ordinal regression with CWGEE
In the dental study, our goal was to obtain the marginal inference of association between MetS and the periodontal health of a typical tooth from a randomly selected person. GEE is appealing not only because the estimator of coefficient is almost as efficient as the maximum likelihood estimator but also because it provides a consistent estimator of coefficient even under a misspecified withinsubject association among the repeated measurements when we have sufficiently large samples [fitzmaurice2008longitudinal]. Due to ICS, estimation using GEE will have oversampled healthy teeth and resulted in biased coefficients [seaman2014review]. To overcome this challenge, Williamson et al. and Benhin et al. both proposed CWGEE [williamson2003marginal, benhin2005cwgee], which weighs the GEE by the inverse of cluster size. Let denote the ordinal outcome categories for the th member of the th subject, , , where is the cluster size (number of teeth) for subject . Let denote the sets of fixed covariates for the th member of the th subject. The model for ordinal outcome using proportional odds logistic regression is expressed as
For estimation, it is common to reexpress the category outcome as a set of binary outcomes, such that
and write the model as a set of logistic regression for each of the binary outcomes [kenward1994ordinal]:
Let , where . Then the estimation of is obtained by solving the following CWGEE,
(2) 
where , and is the diagonal matrix of variance and includes the pairwise correlation between and for . Note that with this estimating equation, we assume a “working independence” structure between the teeth within a subject which is conventional in CWGEE [Williamson07]
. A robust standard error (se) is estimated using the sandwich variance estimator, which has the form
where
and
2.3 MI with multilevel ordinal data
MI is a Monte Carlo technique in which missing values are replaced by a set of simulated values drawn from the posterior predictive distribution . JM and FCS are two main strategies for MI: JM assumes a multivariate normal model for all variables, while FCS specifies a single model for each variable and imputes each of them sequentially. For either strategy, the imputation phase can be summarized into two steps:

Draw , the parameters of the imputation model, from the posterior distribution using the compete data;

Update the imputation by drawing data from the posterior predictive model .
The first step utilizes Gibbs sampler, which is iterated many times to yield an empirical estimate of each parameter’s marginal posterior distribution [enders2018fully]. Each approach has been extended to handle imputation of multilevel data by including a random effects term in the imputation model to account for the correlation between members within the same cluster [enders2016multilevel, enders2018fully].
Following the variables in the dental study, we denote (CAL) as the level1 ordinal outcome with missing data with categories. We also have three “auxiliary” level1 ordinal variables, (PPD), (ABL), and (Mobil) with , , and categories respectively. The “auxiliary” variables are not of direct interest in the analysis but can improve imputation accuracy if included in the imputation model. In addition, we have a level2 continuous covariate and a level2 binary covariate , which are both fully observed. These covariates are in both the analysis and imputation models.
2.3.1 Joint Modeling with R package jomo
JM models the data through a multivariate normal distribution. In the R package jomo
, categorical variables are substituted with latent normal variables during Gibbs sampling and then converted back to discrete values using thresholds
[quartagno2019jomo]. For categorical variables with levels, we need latent normal variables, each of which has a fixed variance of 1, and the covariance with the other latent normal variables of 0.5 [quartagno2019multiple]. To deal with multilevel data, multivariate version of LME is used. In this paper, we consider the random intercept model since only the level1 outcomes contain missing data. Let , , and be the latent normal variables for in level , respectively. Then, we construct a multivariate random intercept model as the multilevel imputation model:Then, the imputed latent variables were converted back to discrete values as follows:
and similarly for .
2.3.2 Fully Conditional Specification with blimp software
Rather than assuming the variables with missing data to follow a multivariate normal distribution, FCS assumes a unique model for each variable with missing values. Based on the distribution of the variable, a unique type of imputation model can be specified (e.g. linear regression for continuous variable, logistic regression for binary variable). For multilevel data, an additional step is required for the Gibbs sampling steps, which is generating imputations based on the current model parameters and level2 residual terms. Thus, each Gibbs cycle uses the current imputations to execute a completedata Bayesian analysis
[enders2018fully]. The procedure of multilevel FCS can be summarized as follows for iteration :
is drawn from the following distribution:
where is the predicted value from the multilevel model and is the withincluster residual vairance.

After updating , is now treated as the outcome and and other covariates are treated as predictors. is drawn from the following distribution:

After updating , repeat the above steps to update and .
The described steps are standard algorithm for multilevel FCS, which are implemented in the R package mice [van2011mice]. This approach is flexible and useful in many applications. However, the imputation model options in mice are limited to incomplete level1 variables that are either continuous or binary. Enders et al. extended mice to handle incomplete nominal and ordinal variables, and provided the software program blimp [enders2018fully].
In blimp, for ordinal data, the cumulative probit model is used to link the latent variable distribution to the discrete responses using a threshold parameter. The imputation model for in step 1 then becomes:
(3) 
Then the imputed latent variable is converted to ordinal variable using the following function:
which finalize the imputation of in iteration . We then repeat step 2 and step 3 similarly to other variables , , and .
2.3.3 Imputation model for data with ICS
In MI, the imputation model needs to include all the features of the analytical model [sterne2009multiple]. The existence of ICS indicates that the outcome is dependent on the cluster size and ignoring this relationship in the imputation model may lead to inefficient and biased estimation of the posterior distribution. Hence, we will include in the imputation model to account for the dependence between the missing values and . Taking FCS as an example, Equation (2.3.2) is then rewritten as
Note that will only be included in the imputation phase to account for additional variance of , thus potentially improving the imputation accuracy. In the analysis model, ICS is accounted for by the CWGEE as in Equation (2). In a simulation study, we will compare the effect of including versus omitting in the imputation model for both JM and FCS.
3 Simulation Studies
3.1 Simulation Setting
We assessed the performance of five missing data approaches for multilevel ordinal outcomes with ICS through a comprehensive simulation study: 1) Completecase analysis (CCA); 2) FCS without cluster size as a predictor (FCS); 3) FCS with cluster size as a predictor (FCS+CS); 4) Joint modeling without cluster size as a predictor (JOMO); 5) Joint modeling with cluster size as a predictor (JOMO+CS).
We mimicked the real dental data to generate the multilevel ordinal outcomes of categories. To generate clustered ordinal data that is subject to ICS, we simulated the outcome of the th tooth of the th subject from a proportional odds logit model with a random intercept that follows a bridge distribution:
(4) 
where was a level2 continuous variable generated from , was a level2 binary variable generated from . The true values for the parameters were . We also simulated three other auxiliary level1 ordinal variables , and following the same procedure with different values of intercepts . , and were used in the imputation of .
To simulate the ordinal outcome with ICS, we used the bridge distribution [mitani2019marginal, parzen2011generalized] to obtain the marginal probability of success when fitting a random intercept logistic regression model of the form:
(5) 
where means the th tooth for th subject is in category , and acts as a random intercept that follows a bridge distribution with density , , . The maximum cluster size in each subject is 28. We used the exchangeable correlation structure with parameter to simulate the correlation between teeth. For each subject , we generated the baseline hazard as a function of the subject specific set of random effects , , where , and controls the degree of ICS. The number of tooth in the subjects was generated from . A detailed description of the simulation is shown in the Supplementary Materials.
We generated missing values through three different missing data mechanisms, MCAR, missing at random (MAR), missing not at random (MNAR) [rubin1976inference] and considered different levels of missingness on the outcome . To generate the missing data mechanism, the missingness indicators for the th tooth of the th subject were generated from the model:
(6) 
When , data were MCAR. When only , data were MAR. Otherwise, data were MNAR. is used to control the missing rate. For the outcome , we generated missing rates of 20% and 50%, representing low and high missing rates, respectively. For the auxiliary outcomes , and , the missing rates were 30%, 30%, 10%, respectively.
Table 1 shows the combination of the various parameters when data were MAR. We considered two different sample sizes , 50 and 250. The degrees of correlation varied from 0, 0.3, to 0.6, representing null, moderate, and strong correlation between teeth. We varied the degrees of ICS from 0, 0.1, to 0.4, representing null, moderate, and high correlation between the outcome and cluster size. When data were MCAR and MNAR, we considered the scenario where and the missing rate was 20%. We obtained the parameter estimates from each simulated data in each scenario through 1000 replications. The following metrics were computed to compare the performance of each imputation approach: (1) the mean parameter estimates (Mean Est); (2)the mean robust standard error (Mean SE); (3) the empirical standard error (Empirical SE); (4) the mean relative bias * 100% (Rel Bias); (5) the 95% coverage probability (Cov Prob); and (6) the mean square error (MSE).
Parameter  Notation  Values  

Sample size  N  50  250  
Degree of correlation (ICC)  0  0.3  0.6  
Degree of ICS  0  0.1  0.4  
Missing rate  r  20%  50% 
3.2 Results
Figure 1 shows the mean relative biases of each imputation method and each parameter under different scenarios when the missing mechanism is MAR. CCA had the largest relative bias, followed by JOMO and JOMO+CS across all parameters. FCS+CS had the smallest mean relative bias in most scenarios, followed by FCS. The differences in mean relative bias between FCS+CS and FCS or between JOMO+CS and JOMO were negligible when the the degree of ICS was null or moderate. However, when the degree of ICS was large, FCS+CS had smaller mean relative bias than FCS, and JOMO+CS had smaller mean relative bias than FCS, indicating that including cluster size as a covariate in the imputation model for both FCS and JOMO improved the estimation accuracy. The effect of ICC on imputation was similar with ICS. The imputation methods were not affected by ICC when ICS was null. When ICS was moderate or large, CCA, JOMO and JOMO+CS performed worse as ICC increased, especially for the intercepts. It is noteworthy that this change was not very observable for FCS, given its good performance under most scenarios. The missing rate also had a large impact on mean relative biases. The mean relative biases for scenarios where the missing rate was 50% were much larger than those where the missing rate was 20%. The difference was more considerable for JOMO+CS, JOMO, and CCA, suggesting FCS was more reliable then JOMO when the missing rate was high. The mean relative bias was slightly lower when sample size was larger. Figure 2 shows the MSE of each imputation method and each parameter under different scenarios when the missing mechanism is MAR. We can see that CCA produced the largest MSE over all scenarios, followed by JOMO. The performance of FCS and FCS+CS were similar to the results with full data, indicating the good performance of FCS. The MSEs of were close to 0 for all methods, which may be related to the scale of the true value. We also compared the empirical SEs and mean SEs of each method in Figures S1 and S2. In general, the empirical SEs and mean SEs were similar. Both empirical SEs and mean SEs increased when the degree of ICS and ICC increased.
Simulation results of intercept and slope for scenario where ICC=0.3, ICS=0.1, missing rate= 20%, and N=50 is shown in Table 2. The mean estimate of FCS+CS or FCS for the intercept was 0.37, which was closer to the true value compared to the JOMO+CS or JOMO. The mean relative biases were 7.06% and 7.98% for FCS and FCS+CS, respectively. The coverage probabilities of FCS and FCS+CS were both close to 95%. Including cluster size in the imputation did not significantly improve the estimation accuracy, as the ICS was moderate. When looking into the details of more extreme cases when ICS was 0.4 and ICC was 0.6, we observed that including cluster size as a covariate in the imputation model shifted the mean closer to the true value by 0.03 (Table S2). When the missing data mechanism was MNAR, all imputation methods had larger biases compared to when data were MAR (Table 3). The mean estimates from CWGEE based on FCS and FCS+CS were 0.29, resulting in 26% relative bias. The coverage probabilities of FCS and FCS+CS were around 93%. The performance of FCS was still better than JOMO and CCA. When the missing data mechanism was MCAR, all imputation methods yielded unbiased estimates (Table S3).
Parameter  Method  Mean Est  Mean SE  Empirical SE  Rel Bias (%)  Cov Prob (%)  MSE 

Full  0.40  0.20  0.20  0.97  95.50  0.04  
CCA  0.19  0.21  0.22  53.07  79.70  0.09  
FCS+CS  0.37  0.21  0.20  7.06  95.80  0.04  
FCS  0.37  0.21  0.20  7.98  95.90  0.04  
JOMO+CS  0.34  0.23  0.22  15.09  94.18  0.05  
JOMO  0.33  0.23  0.22  16.31  95.08  0.05  
Full  0.21  0.19  0.20  4.48  92.40  0.04  
CCA  0.10  0.21  0.23  51.09  89.90  0.06  
FCS+CS  0.17  0.21  0.21  15.09  95.10  0.04  
FCS  0.17  0.21  0.21  16.22  94.70  0.04  
JOMO+CS  0.13  0.23  0.24  32.69  93.99  0.06  
JOMO  0.13  0.24  0.24  33.50  93.57  0.06 
Parameter  Method  Mean Est  Mean SE  Empirical SE  Rel Bias (%)  Cov Prob (%)  MSE 

Full  0.40  0.20  0.20  0.97  95.50  0.04  
CCA  0.10  0.22  0.24  74.61  71.21  0.15  
FCS+CS  0.29  0.23  0.22  26.52  93.38  0.06  
FCS  0.29  0.23  0.22  27.51  92.89  0.06  
JOMO+CS  0.26  0.25  0.25  35.41  91.44  0.08  
JOMO  0.26  0.25  0.25  36.02  91.40  0.09  
Full  0.21  0.19  0.20  4.48  92.40  0.04  
CCA  0.05  0.24  0.26  76.77  87.86  0.09  
FCS+CS  0.13  0.24  0.22  36.58  96.09  0.05  
FCS  0.12  0.24  0.22  40.06  95.40  0.05  
JOMO+CS  0.09  0.28  0.25  56.94  94.47  0.08  
JOMO  0.08  0.28  0.26  59.03  94.03  0.08 
4 Real Data Application
Based on the analysis model in Equation (1), we incorporated the level1 variables described above, in additional with alveolar bone loss (ABL), mobility, and probing pocket depth (PPD). ABL, mobility and PPD are commonly used to quantify the severity of periodontitis, and are correlated with each other. Hence, ABL, mobility, and PPD were included as auxiliary variables to improve the imputation accuracy for CAL. In the imputation phase, we implement FCS+CS, FCS, JOMO+CS, JOMO and CCA.
Table 4 summarizes the results from the VADLS data. The effect of MetS based on the imputation method FCS+CS suggested that the odds of having a lower/healthier CAL score was approximately 20% lower for patients free of MetS compared to patients with MetS, indicating that the presence of MetS is associated with a worse prognosis of periodontitis. Similar to the simulation studies, imputation with FCS+CS followed by the CWGEE model provided the smallest standard error. Recent studies have also reported the association between MetS and periodontal disease [lamster2017periodontal]. However, such an association would not have been found if CCA was used onour data (OR=0.01, se=0.22).
Imputation  Intercept 1  Intercept 2  Intercept 3  age  smoking  Edu (level2)  Edu (level3)  MetS 

CCA  2.68 (1.10)  3.82 (1.10)  4.81 (1.11)  0.04 (0.01)  0.26 (0.26)  0.04 (0.26)  0.4 (0.28)  0.01 (0.22) 
FCS+CS  2.68 (1.10)  3.82 (1.11)  4.79 (1.12)  0.04 (0.01)  0.33 (0.29)  0.02 (0.24)  0.47 (0.26)  0.20 (0.20) 
FCS  2.95 (1.11)  4.09 (1.12)  5.06 (1.13)  0.04 (0.01)  0.43 (0.27)  0.06 (0.24)  0.4 (0.26)  0.14 (0.21) 
JOMO+CS  2.45 (1.01)  3.56 (1.02)  4.52 (1.02)  0.04 (0.01)  0.26 (0.26)  0.04 (0.24)  0.42 (0.26)  0.19 (0.2) 
JOMO  2.83 (1.18)  3.96 (1.18)  4.93 (1.19)  0.04 (0.01)  0.29 (0.27)  0.07 (0.25)  0.51 (0.26)  0.11 (0.21) 
5 Conclusions
In this study, we compared FCS, JM and CCA for imputing missing multilevel ordinal outcomes under different scenarios. Comprehensive simulation studies showed that FCS performed better than JM and CCA. FCS also provided a more reliable and stable performance with varying missing rates and ICC. We also saw that including cluster size as a covariate in the imputation model improved the estimation when the degree of ICS was large. MI methods are valid only when the missing data mechanism is MCAR and MAR. Nevertheless, both JM and FCS are better options for multilevel imputation than CCA.
Although both FCS and JM were based on Monte Carlo techniques, they are fundamentally different. FCS implemented by blimp imputes the ordinal data using a thresholdbased latent probit approach. Since it imputes one variable at a time, it is more flexible and easier to adjust to different data types. On the other hand, the R package jomo uses a nominal probit model, even for imputing ordinal data, which tends to be less appropriate than FCS. Simulation studies conducted by Quartagno et al. showed that if the variables are truly ordinal, it gives good results with only a marginal loss in efficiency [quartagno2019multiple]. However, we observed that the bias from jomo is not negligible compared to FCS when ICS exists. There is another software REALCOM that implements JM that could deal with incomplete multilevel ordinal outcomes, but unfortunately, it is only available for Windows users [carpenter2011realcom].
This study has several limitations. First, we only compared the level1 outcome missing in our data. Level1 and level2 covariates missing could make the imputation more complicated. Comparing the performance of JM and FCS when both covariates and outcomes are missing when ICS exists could be one of the future works. Second, although this study found that the bias introduced by JM using the R package jomo was not negligible, it is not fair to conclude that JM is worse than FCS when imputing multilevel ordinal outcomes. It is of interest to expand the package to deal with ordinal data. Last, our study focused on twolevel clustered data, and did not assess the time effect typically observed in longitudinal studies. Wijesuriya et al. compared the methods for threelevel data with timevarying cluster size for continuous exposures and outcomes [wijesuriya2022multiple]. It would be of interest to extend the comparison to include ICS and ordinal outcomes.
To conclude, this study compared JM and FCS for imputing multilevel ordinal outcomes when data is subject to ICS. We found that FCS is currently the optimal choice, and we should include the cluster size in the imputation model if there is potential for ICS. Our study provides as a further guide on choosing the imputation methods when imputing the multilevel ordinal outcomes.
Acknowledgments
We acknowledge Professor Raul Garcia who is the Principal Investigator and examiner for the Dental Longitudinal Study. The Dental Longitudinal Study and Normative Aging Study are components of the Massachusetts Veterans Epidemiology Research and Information Center which is supported by the VA Cooperative Studies Program. Views expressed in this paper are those of the authors and do not necessarily represent the views of the US Department of Veterans Affairs.
Financial disclosure
This work was supported by the Dalla Lana School of Public Health Data Science Cluster and the Natural Sciences and Engineering Research Council of Canada  Discovery Grants Program (RGPIN202205356).
Conflict of interest
The authors declare no potential conflict of interests.
References
Supporting information
Appendix A Details on Simulating Multilevel Clustered Ordinal Data with Informative Cluster Size
For every cluster ,

Sample
from a multivariate normal distribution, with mean vector
and variance matrix , whereis the correlation between each pair of units within a cluster. We used to exchangeable correlation structure to generate correlation between teeth.

Compute , where is the CDF of standard normal distribution.

Compute , where . has marginal distribution and and are correlated due to the correlation imposed by .

Compute the baseline level of risk for each cluster such that , where .

Sample cluster size from a truncated binomial .

Generate the outcome , which takes values from 1, 2, 3, and 4 from a multinomial distribution with a set of probability () such that
where , .

Repeat for subjects.

Repeat the whole process for each auxiliary outcome with different values of .
Appendix B Supplementary Tables and Figures
Variables  Type  Categories  Summary Stats  Missing Rate 
Age  subjectlevel  Median (range)  76 (60, 98)  0% 
Smoking status  subjectlevel  Eversmoker  40 (17%)  0% 
Education  subjectlevel  High school  62 (26%)  0% 
Some college  86 (36%)  
College graduate  93 (38%)  
Metabolic Syndrome  subjectlevel  Yes  95 (39%)  0% 
nteeth  subjectlevel  Median (range)  22 (1, 28)  0% 
CAL  Toothlevel  levels  4  19% 
PPD  Toothlevel  levels  4  10% 
ABL  Toothlevel  levels  6  25% 
Mobil  Toothlevel  levels  4  0.2% 
Parameter  Method  Mean Est  Mean SE  Empirical SE  Rel Bias (%)  Cov Prob (%)  MSE 

Full  0.38  0.30  0.29  4.58  95.20  0.08  
CCA  0.10  0.29  0.32  76.03  76.68  0.19  
FCS+CS  0.35  0.31  0.28  11.53  96.59  0.08  
FCS  0.32  0.32  0.29  19.78  94.46  0.09  
JOMO+CS  0.32  0.34  0.30  19.62  95.61  0.10  
JOMO  0.29  0.34  0.31  26.71  93.45  0.11  
Full  0.22  0.25  0.29  8.30  90.70  0.08  
CCA  0.10  0.28  0.34  50.82  87.34  0.13  
FCS+CS  0.20  0.29  0.28  1.95  95.58  0.08  
FCS  0.17  0.30  0.29  17.16  95.87  0.08  
JOMO+CS  0.15  0.34  0.33  24.51  95.61  0.11  
JOMO  0.12  0.34  0.31  38.12  96.12  0.10 
Parameter  Method  Mean Est  Mean SE  Empirical SE  Rel Bias (%)  Cov Prob (%)  MSE 

Full  0.40  0.20  0.20  0.97  95.50  0.04  
CCA  0.40  0.21  0.21  1.03  94.90  0.05  
FCS+CS  0.40  0.21  0.20  0.05  95.30  0.04  
FCS  0.40  0.21  0.20  0.48  95.50  0.04  
JOMO+CS  0.40  0.21  0.21  0.74  95.80  0.04  
JOMO  0.40  0.21  0.21  0.91  95.70  0.04  
Full  0.21  0.19  0.20  4.48  92.40  0.04  
CCA  0.21  0.20  0.22  6.43  92.30  0.05  
FCS+CS  0.21  0.19  0.20  6.42  95.10  0.04  
FCS  0.21  0.19  0.20  5.89  94.20  0.04  
JOMO+CS  0.21  0.20  0.21  5.19  94.00  0.04  
JOMO  0.21  0.20  0.21  3.96  94.20  0.04 