“Intermediate outcomes” in the context of clinical trials refer to the outcomes measured after treatment assignment but before the time point of interest. For example, for a trial studying the effect of a 12-month dietary plan on weight, the intermediate outcomes can be the weights at 3, 6 and 9 months after randomization. In randomized clinical trials, intermediate outcomes are routinely collected and have been used in various ways, including trial monitoring (Shih and Quan, 1999), principal stratification (Seuc et al., 2013), decision making in the interim analysis (Kunz et al., 2015), mediation analysis (Landau et al., 2018), etc. We focus on a less known purpose of intermediate outcomes, which is to robustly improve the precision of statistical inference at the time point of interest, e.g. the primary endpoint.
For the purpose of increasing precision and power in clinical trials, covariate adjustment and stratified randomization are two commonly used methods. Covariate adjustment refers to adjustment for chance imbalances in baseline variables, called covariates, among treatment groups by a regression model, and has been extensively studied and applied as a robust method to reduce variance. See Yang and Tsiatis (2001); Rubin and van der Laan (2008); Moore and van der Laan (2009a, b); Zhang (2015); Jiang et al. (2018), just to name a few, for examples. Stratified randomization (Zelen, 1974), also known as stratified permuted block randomization, is a popular randomization scheme that minimizes treatment imbalance within each prespecified randomization strata, and can also increase power (Bugni et al., 2018, 2019; Wang et al., 2019b; Ye et al., 2020). According to a survey by Lin et al. (2015), stratified randomization is used by 70% of trials published in top medical journals. The recent guidance from the U.S. Food and Drug Administration (FDA, 2021) advocates using covariate adjustment and stratified randomization for improving precision. The guidance, however, pointed out that one of its limitations is to not address the use of covariate adjustment for analyzing longitudinal repeated measures data. Our goal is to adjust for intermediate outcomes on top of the covariate adjustment and stratified randomization to maximize the precision gain for the analyses of randomized clinical trials with continuous outcomes.
Marschner and Becker (2001) first showed that jointly modeling short-term and long-term binary outcomes can lead to precision gain compared to modeling long-term outcomes only. Galbraith and Marschner (2003); Stallard (2010); Hampson and Jennison (2013); Zhou et al. (2018)
focused on interim analyses and showed that adjustment for prognostic short-term outcomes can improve interim decisions, assuming the outcomes follow a bivariate normal distribution. All of these methods, however, rely on correctly specified parametric models. Building on general approaches developed by(Lu and Tsiatis, 2011; van der Laan and Gruber, 2012), Qian et al. (2019) derived the non-parametric formulas for the precision gain from adjusting for baseline variables and the short-term outcomes; and Van Lancker et al. (2020) proposed a model-robust estimator that achieves such precision gain for the interim analysis. However, their asymptotic results are limited to simple randomization, single intermediate outcome, and monotone censoring, which fail to take into account the precision gain from stratified randomization and the complexity of non-monotone censoring at multiple visits.
In this paper, we propose a new working model “Improved Mixed Model for Repeated Measures” (IMMRM), which can combine the precision gain from covariate adjustment, stratified randomization and adjustment for intermediate outcomes. The IMMRM working model is an extension of the mixed model for repeated measures (MMRM, Mallinckrod et al. 2008
) to handle multiple treatment groups, treatment heterogeneity and heteroscedasticity. To the best of our knowledge, this is the first method that fully utilizes pre-randomization variables, the randomization procedure, and post-randomization information to improve precision.
Assuming mild regularity conditions and missing completely at random, the IMMRM estimator for the average treatment effect at the time point of interest is model-robust, i.e., consistent and asymptotically normal under arbitrary misspecification of its working model. Furthermore, under simple or stratified randomization, the IMMRM estimator is asymptotically equally or more precise than the following commonly-used estimators: the analysis of covariance (ANCOVA, Yang and Tsiatis, 2001) estimator, and the MMRM estimator with or without visit-by-covariates interactions in the working model.
Our result implies that appropriately leveraging post-randomization information can lead to precision gain beyond what comes from covariate adjustment and stratified randomization. In contrast, although MMRM involves intermediate outcomes, it can be less precise than ANCOVA, even when visit-by-covariates interactions are included in its working model. This result generalizes the finding of Schuler (2021), which showed ANCOVA outperforms MMRM under two-arm, equal, simple randomization with no missing data. In addition, we provide the necessary and sufficient condition for when intermediate outcomes provide precision gain, which is, essentially, that the intermediate outcomes provide additional information beyond covariates to the missing outcomes at the time point of interest.
In the next section, we introduce three randomized clinical trials on type 2 diabetes. In Section 3, we present our setup, notations and assumptions. In Section 4, we describe the ANCOVA estimator, MMRM estimator (with or without visit-by-covariates interactions), and our proposed IMMRM estimator. Our main result is presented in Section 5, which consists of asymptotic theory, explanations of precision gain from intermediate outcomes, and discussions on a special case of two-arm equal randomization. Simulation study and data application on three completed diabetes randomized clinical trials are given in Section 6 and 7, respectively. We provide practical recommendations and discuss future directions in Section 8.
2 Three randomized clinical trials
2.1 Trial 1: the IMAGINE-2 study
The trial of “A Study in Patients With Type 2 Diabetes Mellitus (IMAGINE 2)” (NCT01435616) is a 52-week, two-arm, phase 3 randomized clinical study completed in 2014 (Davies et al., 2016). The goal of this trial was to evaluate the effect of basal insulin for treatment of type 2 diabetes in an insulin-naïve population.
Participants were randomly assigned to receive insulin peglispro (treatment, 1003 patients) or insulin glargine (control, 535 patients), with a target to achieve 2:1 randomization ratio. Randomization was stratified by baseline HbA1c ( or ), low‐density lipoprotein cholesterol (mg/dL or mg/dL) and baseline sulphonylurea or meglitinide use (yes or not). HbA1c is a continuous measure of average blood glucose values in the prior three months. The primary outcome was the change in HbA1c at week 52 from baseline (15% missing outcomes), while intermediate outcomes were measured at week 4 (3% missing outcomes), week 12 (6% missing outcomes), week 26 (10% missing outcomes) and week 39 (13% missing outcomes). We focused on estimating the average treatment effect of the primary outcome and adjusted for intermediate outcomes, baseline HbA1c value, and randomization strata across different estimators.
2.2 Trial 2: another insulin peglispro study
The trial of “A Study of Insulin Peglispro in Participants With Type 2 Diabetes Mellitus” (NCT02106364) is a 26-week, two-arm, phase 3 randomized clinical study comparing insulin peglispro with insulin glargine for treatment of type 2 diabetes mellitus in Asian insulin-naïve population (Hirose et al., 2018). This trial studied the same product (insulin peglispro) as in Trial 1, while focused on the Asian population.
Participants were randomized 1:1 to receive insulin peglispro (treatment, 192 patients) or insulin glargine (control, 198 patients). Randomization was stratified by baseline HbA1c ( or ), low‐density lipoprotein cholesterol (mg/dL or mg/dL), sulfonylurea/meglitinide use at baseline (yes or not) and location (Japan, Korea or Taiwan). The primary outcome was the change in HbA1c at week 26 from baseline (6% missing outcomes). Intermediate outcomes at week 4 (2% missing outcomes) and week 12 (3% missing outcomes) were included to the estimation. We focused on estimating the average treatment effect of the primary outcome with adjustment for intermediate outcomes, baseline HbA1c value, and randomization strata across different estimators.
2.3 Trial 3: the tirzepatide study
The trial of “A Study of Tirzepatide (LY3298176) in Participants With Type 2 Diabetes Mellitus” (NCT03131687) is a 26-week, six-arm randomized phase 2 trial (Frias et al., 2018). The goal of this trial was to evaluate the dose-response in efficacy and safety of four doses of tirzepatide (1mg, 5mg, 10mg and 15mg), a novel dual GIP and GLP-1 receptor agonist, in patients with type 2 diabetes compared to the placebo and dulaglutide 1.5mg (active comparator). In this study, 318 patients were equally randomized to one of the six parallel treatment groups with each group containing 51-55 patients. For the purpose of demonstration, we dropped the active comparator group from our analysis and only considered comparing four doses of tirzepatide to the placebo.
Randomization was stratified by baseline HbA1c ( or ), metformin use (yes or no), and BMI ( kg/m² or kg/m²). We focused on the the body weight change from baseline, which was a secondary outcome of the study. The weight change was measured at eight post-randomization visits, which are week 1, 2, 4, 8, 12, 16, 20 and 26, with 1%, 3%, 6%, 11%, 15%, 18%, 17%, 19% missing outcomes at each visit respectively. Our data application focused on estimating the average treatment effect of the weight change at week 26 from baseline and adjusted for all intermediate outcomes, the baseline body weight, and randomization strata across different estimators.
3 Definition and assumptions
3.1 Data generating distributions
Consider a trial where the outcome is continuous and repeatedly measured at visits, where is a positive integer. For each participant
, the outcome vector atvisits is and the non-missing status at visits is , where if is observed at visit , and 0 otherwise. Let
be a categorical variable taking values in, with representing that participant is assigned to the -th treatment group. By convention, we use to denote being assigned to the control group. Let be a vector of baseline variables with length . Throughout, we refer to as the final outcome, and as the intermediate outcomes (if ), for conciseness.
We use the Neyman-Rubin potential outcomes framework (Neyman et al., 1990), which assumes , where is the indicator function and is the potential outcome vector for treatment group . Analogously to the consistency assumption above, we assume , where is the indicator vector of whether would be observed at each of the visits, if participant was assigned to treatment group for .
For each participant , we define the complete data vector as
and the observed data vector as , where is the vector of observed outcomes, whose dimension may vary across participants. For example, if participant only shows up in visits 1 and , then . For the special case that a participant misses all post-randomization visits but still has baseline information recorded, the observed data vector is with being a zero vector. All estimators defined in Section 4 are functions of the observed data .
We make the following assumptions on :
are independent, identically distributed samples from the joint distributionon .
(Missing completely at random, MCAR) is independent of for , and are identically distributed.
In addition to Assumption 1, we also assume regularity conditions for the estimators defined in Section 4. As we show in the Supplementary Material, all estimators we consider (including our proposed estimators) are M-estimators (Section 5 of van der Vaart, 1998), which is defined as a zero of prespecified estimating functions. The regularity conditions are made on these estimating functions, the complete data distribution , and the parameters involved in the estimating equations. These conditions are similar to the classical conditions given in Section 5.3 of van der Vaart (1998) for proving consistency and asymptotic normality for M-estimators under simple randomization. We provide these regularity conditions in the Supplementary Material.
The parameters of interest are the average treatment effects of the final outcome comparing each treatment group to the control group, i.e.,
where is the expectation with respect to the distribution . Our results in Section 5
also apply for estimating any linear transformation of, e.g., the average treatment effect comparing any two treatment groups.
3.2 Simple and stratified randomization
We consider two types of treatment assignment procedures: simple randomization and stratified randomization. For , let be the target proportion of participants that are assigned to the treatment group . We assume that and for all treatment groups. For example, equal randomization refers to the setting where .
Simple randomization allocates treatment by independent draws from a categorical distribution on , with for . Then are independent, identically distributed samples from this categorical distribution, and also independent of .
Under stratified randomization, treatment assignment depends on a set of categorical baseline variables, which are called stratification variables. We use a categorical random variable with support to denote the joint levels created by all stratification variables. For example, if randomization is stratified by sex (female or male) and weight (normal, overweight, or obesity), then can take possible values. Each element in is referred to as a “randomization stratum”. Within each randomization stratum, permuted blocks are used for sequential treatment allocation. Each permuted block contains fraction ’s for , with representing treatment group . At the onset of treatment allocation, a permuted block is randomly chosen and used to sequentially assign treatment. After a block is exhausted, a new block is used.
Compared with simple randomization, stratified randomization is able to achieve balance within each randomization stratum, i.e., exact fraction of participants are assigned to group . Under stratified randomization, the treatment assignments are not independent of each other; and are conditionally independent of given . For each participant , we assume that is encoded as dummy variables (dropping one level to avoid collinearity) in the baseline vector .
We finish this section by introducing a few additional definitions. For any two symmetric matrices and with the same dimension, we denote if is positive semi-definite. For any estimator of , we call the asymptotic covariance matrix of if weakly converges to a multivariate normal distribution with mean and covariance matrix . If two estimators and of have asymptotic covariance matrix and respectively, we call is (asymptotically) equally or more precise than if . Such an expression is commonly used for scalar estimators, and we extend it to vector estimators.
For estimating the average treatment effects , we first introduce three commonly-used estimators, the ANCOVA estimator (Section 4.1), the MMRM estimator (Section 4.2), and the MMRM estimator with visit-by-covariates interactions (Section 4.2). Next, we propose a new estimator, the IMMRM estimator (Section 4.3), which models treatment heterogeneity and heteroscedasticity based on a variant of the MMRM model. Our main result in Section 5 is that, given Assumption 1 and mild regularity conditions, the IMMRM estimator is equally or more precise than the ANCOVA estimator, MMRM estimator and MMRM-VCI estimator under simple or stratified randomization, without assuming the IMMRM working model is correctly specified.
4.1 The Analysis of Covariance (ANCOVA) estimator
The ANCOVA estimator for is defined as the maximum likelihood estimator (MLE) for parameters in the working model below:
where are parameters, and is independent of and follows a normal distribution with mean 0 and unknown variance . We denote the ANCOVA estimator as .
Since it was first proposed by Fisher et al. (1937), ANCOVA has been extensively studied and applied. Under simple randomization, Yang and Tsiatis (2001) showed that the ANCOVA estimator for two arms (i.e., in our setting) is consistent and asymptotically normal given arbitrary misspecification of its working model. Ye et al. (2021) later generalized this result to accommodate multiple arms and designs with covariate-adaptive randomization schemes. If there are missing outcomes and the ANCOVA estimator is calculated using data vectors with observed outcomes only (i.e. participants with ), then the above results in this paragraph hold under the MCAR assumption.
When outcomes are repeatedly measured, the ANCOVA estimator wastes the information from intermediate outcomes. Although ignoring such information does not affect the consistency and asymptotic normality of ANCOVA, the intermediate outcomes can provide information for the missing final outcomes. As we show in Section 5 below, using the proposed IMMRM estimators (defined in 4.3 below) to adjust for intermediate outcomes can improve precision.
4.2 The mixed-effects model for repeatedly measured outcomes (MMRM)
The MMRM working model is defined as, for each
where are parameters that are specific for visit , are parameters that are invariant across , is independent of and has a multivariate normal distribution with mean and unknown covariance . The covariance matrix is assumed to be unstructured; that is, no other assumption is made on except that it is positive definite. The MMRM estimator for is defined as the MLE for parameters in the working model (2) using observed data .
The MMRM model (2) is derived from a linear mixed-effects model with fixed effects and random effects, where parameters and are fixed effects. The random effects are marginalized and implicitly represented in the covariance matrix .
In some clinical trials, e.g. Sorli et al. (2017), Lane et al. (2017) among others, the MMRM model (2) is augmented by including visit-by-covariates interactions, which we refer to as the MMRM-VCI model. Specifically, the MMRM-VCI working model is, for each ,
which differs from the MMRM working model only on the regression coefficients of , where is substituted for . We define the MMRM-VCI estimator as the MLE for in model (3). Compared with the MMRM model, the MMRM-VCI model can capture the time-varying correlation of covariates and outcomes. In addition, MMRM-VCI is also a generalization of the constrained longitudinal data analysis by Liang and Zeger (2000), which focused on being the baseline value of the outcome and following a normal distribution. As we will show in Section 5.3, under two-arm equal randomization, the MMRM-VCI estimator is equally or more precise than the MMRM estimator and the ANCOVA estimator.
In the MMRM model (2) and the MMRM-VCI model (3), the correlation of covariates and the outcome vector is assumed to be not varied among treatment groups, which we call the homogeneity assumption. In addition, the covariance matrix of is also assumed to be the same across treatment groups, which we refer to as the homoscedasticity assumption. Due to these assumptions, although MMRM (or MMRM-VCI) utilizes information from intermediate outcomes, it does not provide guaranteed precision gain compared to ANCOVA if the MMRM (or MMRM-VCI) working model is misspecified. Schuler (2021) showed that, under two-arm equal randomization, ANCOVA is asymptotically more powerful than MMRM when no outcomes are missing. In two of our data applications below (Section 7), the MMRM estimator is less precise than ANCOVA.
4.3 Improved MMRM: modeling heterogeneity and heteroscedasticity among groups and visits
We propose a working model, called “IMMRM”, that handles both heterogeneity and heteroscedasticity as follows: for each
where has a multivariate normal distribution with mean and covariance for each and are independent of and each other. The fixed effects in model (4) are for . Each is assumed to be positive definite and unstructured.
The IMMRM estimator for is defined as
where and, for , are MLE for in model (4) respectively.
Compared with MMRM, the IMMRM working model has two improvements. First, the inclusion of treatment-covariates-visits three-way interaction terms allows the relationship between the outcomes and baseline variables to vary across treatment groups and visits. Such interaction terms models heterogeneity, which has been shown by Tsiatis (2007); Ye et al. (2021) as an effective method to improve precision for scalar outcomes. We extend this idea to longitudinal repeated measures data. Second, the covariance matrix of is modeled separately for each treatment group, which accounts for heteroscedasticity. Gosho and Maruo (2018) first proposed the idea of modeling heteroscedasticity in MMRM; however, they only provide empirical results to show its benefits. We show, in Section 5.2, that modeling heteroscedasticity is necessary for achieving asymptotic precision gain when repeated measure outcomes are jointly modeled. We also give an example (in the Supplementary Material) that MMRM-VCI, which does not account for heteroscedasticity, is 5% less precise than ANCOVA, even though MMRM-VCI uses information from intermediate outcomes.
5 Main results
5.1 Asymptotic theory
Assume Assumption 1 and regularity conditions. Consider the ANCOVA estimator, MMRM estimator, MMRM-VCI estimator and the IMMRM estimator, which we denote as for .
For each of the four estimators, under simple or stratified randomization, we have consistency, i.e., in probability, and asymptotic normality, i.e.,
in probability, and asymptotic normality, i.e.,weakly converges to a mean-zero multivariate normal distribution, under arbitrary misspecification of its working model.
Denote and as the asymptotic covariance matrices of under simple and stratified randomization, respectively. We have, for each
In addition, we provide the conditions for in the Supplementary Material.
has the following implications. First, under simple or stratified randomization, each of the ANCOVA, MMRM, MMRM-VCI and IMMRM estimators is model-robust, i.e., consistent and asymptotically normal under arbitrary misspecification of its working model. This robustness property allows the statistical inference to be based on normal approximation without relying on working model assumptions. Second, the IMMRM estimator has the highest precision among the four estimators. By jointly modeling heterogeneity and heteroscedasticity, the IMMRM estimator combines the precision gain from adjusting for intermediate outcomes, covariate adjustment and stratified randomization. Such precision gain can be translated into sample size reduction needed to achieve the desired power. Third, the IMMRM estimator has the same asymptotic covariance matrix under simple or stratified randomization. As a consequence, under stratified randomization, the confidence interval for the IMMRM estimator can be constructed as if simple randomization were used, without being statistically conservative.
For performing hypothesis testing and constructing confidence intervals, we provide consistent estimators for the asymptotic covariance matrices and in the Supplementary Material. The sandwich variance estimator (Tsiatis, 2007) is used to estimate ; and the expression of is derived in the Supplementary Material and approximated by substituting , the expectation with respect to the empirical distribution, for .
is model-robust and asymptotically linear under simple or stratified randomization. Next, we prove the asymptotic normality by a central limit theorem for sums of random vectors under stratified randomization, which is a generalization of Lemma B.2 ofBugni et al. (2019). Finally, the asymptotic covariance matrices are calculated and compared. In the proof of Theorem 1, the consistency, asymptotic linearity, and asymptotic normality can be seen as applications of the semiparametric theory for M-estimators (for simple randomization) and generalizations of asymptotic results by Wang et al. (2019b); Bugni et al. (2019) (for stratified randomization). In addition, the asymptotic result for the ANCOVA estimator is not new, which has been proved by Yang and Tsiatis (2001); Tsiatis (2007); Ye et al. (2021); we include it in Theorem 1 for conveniently presenting the comparison of asymptotic covariance matrices. The major innovation, also the major challenge, of the proof is to derive the partial order (5), where the multivariate non-missing indicator adds substantial algebraic difficulty. We overcome this challenge by developing a series of inequalities related to functions of and positive definite matrices.
5.2 How adjustment for intermediate outcomes improves precision
Consider the IMMRM working model (4) with a modification that intermediate outcomes are excluded, i.e.,
Then the IMMRM estimator for under the above working model (6) is equivalent to the “ANHECOVA” estimator, which is proposed by Ye et al. (2021). As a special case of the IMMRM model (4), the working model (6) differs from IMMRM only on whether intermediate outcomes are adjusted. By comparing the asymptotic covariance matrices of the IMMRM estimator with the ANHECOVA estimator, we examine the contribution of intermediate outcomes in improving precision beyond what comes from covariate adjustment and stratified randomization.
If , a case with no intermediate outcomes, the IMMRM estimator and ANHECOVA estimators are equivalent. For , the following corollary shows that adjusting for intermediate outcomes as in the IMMRM working model (4) can improve precision, compared to no adjustment for intermediate outcomes as in the working model (6).
Assume , Assumption 1 and regularity conditions in the Supplementary Material. Let be the asymptotic covariance matrix of the ANHECOVA estimator based on the working model (6). Then .
Furthermore, if and only if, for each and ,
where , is the covariance between any random vectors with finite second moments, and
with finite second moments, andis the covariance matrix of .
Corollary 1 specifies the conditions for when adjusting for intermediate outcomes brings precision gain. For an intermediate visit , and the MCAR assumption imply that a participant has a positive probability to both appear in visit and miss the last visit ; and means that, for the treatment group , is correlated with the residual of after regressing on baseline variables. If an intermediate outcome satisfies the above two conditions for some treatment group , then adding it to the IMMRM working model will lead to precision gain. On the contrary, adjusting for an intermediate outcome makes no change on the asymptotic covariance matrix if an intermediate outcome is missing whenever is missing, or if it is not prognostic to the final outcome after controlling for in any treatment group.
Corollary 1 implies that leveraging intermediate outcomes can bring precision gain only when there are missing final outcomes and the intermediate outcomes are prognostic to the last outcome beyond what is explained by covariates. This finding generalizes the results of Qian et al. (2019), which considers a special case of our setting with , simple randomization and monotone censoring.
Unlike the IMMRM estimator, adjusting for the intermediate outcomes as in the MMRM working model (2) or the MMRM-VCI working model (3) may increase the variance, compared to the ANCOVA estimator. For the MMRM estimator, the effect of on is modeled as a constant vector across visits and treatment groups, while the ANCOVA estimator focuses on the last visit and models the effect of on as a constant vector across treatment groups. If ANCOVA happens to capture the true effect of on , then it is more precise than MMRM, and vice versa. Our data application in Section 7 also shows that MMRM can be either more or less precise than ANCOVA on different data sets. For the MMRM-VCI estimator, although it models on in the same way as in ANCOVA, it fails to capture the heteroscedasticity. In the supplementary material, we give a simple example showing that the MMRM-VCI estimator has a 5% larger asymptotic variance than the ANCOVA estimator in the presence of heteroscedasticity. An exception, which we discuss in detail in Section 5.3, is the two-arm equal randomization, where MMRM-VCI is equally or more precise than ANCOVA.
5.3 Special case: two-arm equal randomization
In a general setting, e.g. multi-arm trials or unequal randomization, none of the ANCOVA, MMRM or the MMRM-VCI estimator is asymptotically more precise than the others. Under the two-arm equal randomization (i.e. and ), however, the following corollary implies that the MMRM-VCI estimator has equal or smaller asymptotic variance than ANCOVA and MMRM; in addition, the MMRM-VCI estimator has the same asymptotic variance under simple or stratified randomization.
Assume , , Assumption 1, and regularity conditions in the Supplementary Material. Then , and .
Corollary 2 extends the results of Schuler (2021), which assumes no missing data and shows under two-arm, simple, and equal randomization. Despite the advantage of MMRM-VCI over ANCOVA and MMRM, the IMMRM estimator remains to be equally or more precise than MMRM-VCI, for which we demonstrate by the data application.
6 Simulation study
6.1 Simulation settings
We conducted a simulation study assessing the performance of the ANCOVA (1), MMRM (2), MMRM-VCI (3) and IMMRM (4) estimators in varied simulation settings. This simulation study was based on the IMAGINE-2 study introduced in Section 2.1.
In the simulation, we considered five post-randomization visits (), three treatment groups () and four randomization strata (), trying to duplicate the setting of the IMAGINE-2 study while considering a multi-arm study. The simulated data were generated by the following steps.
First, we defined the potential outcome for a reference group. We took the completers at the end of week 52 from insulin peglispro arm to serve as the super-population for a reference group, called AC (active comparator). By doing so, the underlying data generating mechanism remained unknown and also mimicked the real data distribution. Let us use to represent the outcome vector for patient had this patient taken treatment AC.
We then defined the potential outcome for two other treatment groups, named TRT1 and TRT2, as two hypothetical basal insulin treatments. For each , we used and to denote the outcome vector for patient if he/she had been assigned treatment TRT1 and TRT2, respectively. Let be the baseline HbA1c and be the baseline indicator of LDL cholesterol mmol/L (100mg/dL) for patient . We generated the potential outcome for and by:
where, for and , is a constant specifying the average treatment effect comparing TRT to AC at time , and are coefficients that determine the degree of heterogeneity among treatment arms, and and are averages of and across in the AC group. The quadratic terms adds another layer of model misspecification. We set , . This indicates that the true average treatment effect is . We let , , and . These values were set to explore a range of mild treatment heterogeneity. The negative signs of and indicated that a higher baseline HbA1c is associated with a larger HbA1c change.
After we defined the potential outcomes for all treatment groups, we simulated the randomized clinical trial by resampling with replacement from the empirical distribution of . We considered two settings for sample size: or per arm. These numbers represent typical phase 2 or 3 diabetes clinical trials. We defined a strata variable which covers all the joint levels of stratum defined by baseline HbA1c () and LDL cholesterol (). We applied stratified permute block randomization with a block size of 6 to randomly assign the resampled patients to three treatment arms with 1:1:1 randomization ratio. Then the treatment variable was taking values in and the outcome vector was .
In the next step, we generated missing outcomes to the simulated data. We considered two missing data mechanism: missing completely at random (MCAR) and missing at random (MAR), both under the assumption of monotone censoring. We mimicked the missing data percentages across 5 post-baseline visits in IMAGINE-2 study such that , and
are expected to be missing at visits 1-5 respectively. For MCAR, the censoring time was generated by a logistic regression (with an intercept only) to achieve the missing data percentages above. For MAR, the censoring time was determined by a logistic regression model on the treatment arms, HbA1c values at the previous visit and baseline. Details of the missing data mechanism under MAR is given in the Supplementary Material. The dropout rate was made higher in the arms AC and TRT1 compared to the arm TRT2 and the patients are more likely to drop out given a higher HbA1c observed from the previous visit.
) were then used to compute the estimate of the average treatment effect in change from baseline of HbA1c comparing arms TRT1 and TRT2 to arm AC at the last visit, their standard errors and the 95% confidence intervals. For all estimators, the standard error is calculated by the sandwich variance estimator (using the option “empirical” in SAS with an adjustment for the variability in the mean covariates as pointed out byQu and Luo, 2015 in estimating the average treatment effect). Such estimators do not account for stratified randomization.
The above procedure were repeated for 10,000 times for each simulation setting, i.e. MCAR versus MAR and versus . For each estimator and each setting, we computed the bias (average bias of estimates), empirical standard error (ESE, standard error of estimates), averaged standard error (ASE, average standard error for each estimate), coverage probability (CP, the percentage of simulations where the confidence interval covers the truth), power (the percentage of simulations that reject null treatment effect), and the relative mean squared error (RMSE, mean squared error of the estimates divided by the mean squared error of the IMMRM estimates).
6.2 Simulation results
The simulation results are summarized in Table 1 for and Table 2 for . Across the simulation settings, all candidate estimators provided small bias, coverage probability close to 95% and type-1-error (i.e., the power comparing TRT1 vs AC) close to 5%. Under MAR, all estimators tended to have slightly larger bias than those under MCAR, since none of the estimators are unbiased under MAR and model misspecification. Such bias is related to the magnitude of treatment effect and the missing mechanism.
Among all estimators, the IMMRM estimator had the smallest mean squared error reflected by the RMSE measure. This is consistent with our asymptotic results, and the simulation also demonstrated the advantage of IMMRM when the sample size is small. The MMRM estimator, however, had the largest mean squared error among the four estimators. The MMRM-VCI estimator showed a precision gain compared to ANCOVA and MMRM, while such gain is not asymptotically guaranteed as discussed in Section 5.2.
Compared with empirical standard error and RMSE, the average standard error, which reflected the current practice, failed to account for the precision gain from the stratified randomization. For ANCOVA, MMRM and MMRM-VCI, the average standard error were in general greater than the empirical standard error, while the IMMRM estimator appeared to have similar average and empirical standard errors, which further supported Theorem 1. For MMRM estimator, ignoring stratified randomization led to an overestimation in standard error by up to 25%.
|n=50||MCAR||ANCOVA||TRT1 vs AC||0.004||0.229||0.238||95.7||4.3||1.091|
|TRT2 vs AC||0.007||0.269||0.275||95.1||96.0||1.088|
|MMRM||TRT1 vs AC||0.006||0.232||0.264||97.2||2.8||1.105|
|TRT2 vs AC||0.004||0.301||0.370||98.5||82.1||1.219|
|MMRM-VCI||TRT1 vs AC||0.005||0.216||0.212||94.6||5.4||1.028|
|TRT2 vs AC||0.005||0.268||0.284||96.0||95.6||1.086|
|IMMRM||TRT1 vs AC||0.002||0.210||0.199||93.1||6.9||-|
|TRT2 vs AC||0.003||0.247||0.239||93.6||98.7||-|
|MAR||ANCOVA||TRT1 vs AC||0.018||0.228||0.239||96.1||3.9||1.082|
|TRT2 vs AC||0.037||0.274||0.276||94.9||94.3||1.091|
|MMRM||TRT1 vs AC||0.005||0.235||0.266||97.6||2.4||1.112|
|TRT2 vs AC||0.002||0.306||0.370||97.8||82.5||1.211|
|MMRM-VCI||TRT1 vs AC||0.012||0.217||0.213||94.4||5.6||1.028|
|TRT2 vs AC||0.017||0.274||0.285||95.5||94.7||1.086|
|IMMRM||TRT1 vs AC||-0.001||0.211||0.200||93.5||6.5||-|
|TRT2 vs AC||0.003||0.252||0.239||93.6||98.5||-|
|n=200||MCAR||ANCOVA||TRT1 vs AC||0.002||0.112||0.120||96.9||3.1||1.082|
|TRT2 vs AC||0.005||0.133||0.140||96.0||100.0||1.091|
|MMRM||TRT1 vs AC||0.000||0.116||0.133||97.8||2.2||1.115|
|TRT2 vs AC||0.004||0.149||0.186||98.5||100.0||1.224|
|MMRM-VCI||TRT1 vs AC||0.002||0.105||0.107||95.7||4.4||1.015|
|TRT2 vs AC||0.005||0.132||0.143||96.7||100.0||1.082|
|IMMRM||TRT1 vs AC||0.001||0.104||0.102||94.9||5.1||-|
|TRT2 vs AC||0.003||0.122||0.122||95.2||100.0||-|
|MAR||ANCOVA||TRT1 vs AC||0.016||0.115||0.122||96.0||4.0||1.090|
|TRT2 vs AC||0.026||0.135||0.141||95.5||100.0||1.089|
|MMRM||TRT1 vs AC||0.004||0.117||0.134||97.3||2.7||1.106|
|TRT2 vs AC||0.003||0.152||0.187||98.4||100.0||1.221|
|MMRM-VCI||TRT1 vs AC||0.010||0.108||0.108||94.7||5.3||1.021|
|TRT2 vs AC||0.013||0.134||0.143||96.0||100.0||1.082|
|IMMRM||TRT1 vs AC||0.000||0.106||0.103||94.4||5.6||-|
|TRT2 vs AC||0.000||0.124||0.122||94.4||100.0||-|
7 Data application
We applied the ANCOVA, MMRM, MMRM-VCI and IMMRM estimators to Trials 1-3. All three trials used stratified randomization, which we accounted for in the computation of standard error by using the consistent variance estimator given in the Supplementary Material. For each estimator and each treatment comparison, we computed the estimate, the standard error (SE) and the proportional variance reduction (PVR), defined as one minus the variance ratio of the estimator and the MMRM estimator. Positive PVR is in the direction of variance reduction, while negative PVR indicates that the compared estimator has larger variance than the MMRM estimator. The PVR is a measurement of precision change of using the estimator compared with the MMRM estimator, and can be translated into the same amount of proportional sample size change for achieving a desired power.
For Trial 1, due to very limited data for participants with no sulfonylurea/meglitinide use at baseline, we drop this stratification variable in the analysis, resulting in 4 randomization strata in total. Combining small strata, as we did here, is a method that achieves better finite sample estimation and controls the type I error at the same time, as discussed byWang et al. (2019b). Similar to Trial 1, we dropped the stratification variable sulfonylurea/meglitinide use from Trial 2 due to its data limitation, resulting in 12 total strata. For Trial 3, we dropped the metformin use variable due to limited data for participants with no metformin use, resulting in 4 randomization strata in total.
Table 3 summarizes our data applications, which consists of six treatment-control comparisons. Among all six comparisons, the IMMRM estimator had the smallest standard error in five. All the estimators provided similar estimates. In the comparison (tzp 1mg vs pbo) that IMMRM was less precise than MMRM-VCI, it still outperformed ANCOVA and MMRM; such results may occur in practice when the sample size is small, the intermediate outcomes are prognostic, and the homogeneity and homoscedasticity assumptions hold. Overall, the IMMRM estimator was 2-24% more precise than ANCOVA, 5-16% more precise than MMRM, and up to better than the MMRM-VCI. Besides IMMRM, MMRM-VCI also showed precision gain compared to standard practice that uses ANCOVA or MMRM. Although such gain is not guaranteed, MMRM-VCI may have comparable standard error compared to IMMRM in practice.
Table 3 also shows that MMRM may have a lower or higher variance than ANCOVA, which indicates that adjustment for intermediate outcomes using MMRM may harm the precision. In Trial 2, the precision loss was 14% compared to ANCOVA. We hence recommend using IMMRM or MMRM-VCI, instead of MMRM, to adjust for intermediate outcomes.
While Trials 1, 2 and 3 are different in many aspects, the IMMRM estimator had smaller standard error than the ANCOVA and MMRM estimators in all three trials. Trial 1 had a large sample size, unequal two-arm randomization and five post-randomization outcomes. Trial 2 used two-arm equal randomization with three post-randomization outcomes. In addition, trial 2 had small percentage of missing data in the primary outcome (less than 6%) across all visits. Our analyses for Trial 3 involved five treatment groups, eight post-randomization outcomes and 19% missing outcomes at the last visit, which is a typical setting of phase 2 trials. By the data application on the three clinical trials, we were able to evaluate the performance of estimators in three distinct live scenarios with different outcome and missing data distributions, phases of the study, sample sizes, and numbers of treatment groups and clinical visits.
|ANCOVA (1)||MMRM (2)||MMRM-VCI (3)||IMMRM (4)|
|peglispro vs glargine||SE||0.050||0.051||0.049||0.049|
|peglispro vs glargine||SE||0.072||0.077||0.072||0.071|
|tzp 1mg vs pbo||SE||0.760||0.725||0.664||0.713|
|tzp 5mg vs pbo||SE||0.802||0.775||0.761||0.742|
|tzp 10mg vs pbo||SE||1.173||1.103||1.081||1.021|
|tzp 15mg vs pbo||SE||1.196||1.100||1.073||1.047|
For the analysis of longitudinal repeated measures data, we propose the IMMRM estimator that can improve precision, compared to standard practice that uses ANCOVA or MMRM. In some cases, the variance reduction can be substantial, e.g., 16% in our data application of Trial 2 compared to MMRM. Such precision gain comes from modeling treatment heterogeneity and appropriately adjusting for intermediate outcomes, and can be translated into sample size reduction when planning a trial.
We recommend applying the IMMRM estimator for large trials, which can lead to more sample size reduction (Wang et al., 2019a). In addition, since the IMMRM working model involves substantially more parameters than MMRM ( parameters in IMMRM compared with
parameters in MMRM), large trials, compared to small trials, have more degrees of freedom for the variance estimation. Under the two-arm equal randomization, the MMRM-VCI estimator is a good alternative to the IMMRM estimator for small trials, since MMRM-VCI involves fewer parameters and still outperforms ANCOVA and MMRM.
Leveraging post-randomization information to improve precision is not limited to intermediate outcomes. Our main results also apply to adjustment for other post-randomization continuous-valued random variables measured before the final outcomes, such as the body mass index measured at each visit. When these additional variables provide new prognostic information beyond intermediate outcomes and covariates, adding them to the IMMRM model can bring further asymptotic precision gain. A trade-off in finite sample variance estimation, however, is the improved precision versus the reduced degrees of freedom for variance estimation due to estimating additional parameters.
We assumed missing completely at random, which is a typical assumption for proving model-robustness. In contrast, under the missing at random assumption (MAR), i.e. is conditionally independent of given for each and , the estimators defined in Section 4 may no longer be consistent if their working models are misspecified. To deal with MAR, a convenient approach is to assume a correctly specified working model. In this case, IMMRM remains a better choice than MMRM and MMRM-VCI, since they are more restrictive than IMMRM. If MMRM or MMRM-VCI is assumed to be correct, then the IMMRM estimator remains consistent and is asymptotically equivalent to the MMRM or MMRM-VCI estimator.
Under the MAR assumption, an alternative approach for estimating the average treatment effect is the targeted minimum loss based estimation (TMLE, van der Laan and Gruber, 2012). The method involves recursively fitting regression models for outcomes and propensity scores for non-missingness, and is consistent as long as one of the two sets of models is correctly specified. It is an open question, to the best of our knowledge, whether adjusting for the intermediate outcomes provides precision gain under the MAR assumption using TMLE.
Among all variants of MMRM discussed in this paper, the random effects are assumed to be independent of covariates. If, otherwise, random effects are modeled to involve covariates, our main results may not apply. Under simple randomization and correctly specified MMRM model, Cnaan et al. (1997) stated that the MMRM estimator is consistent and asymptotically normal, whose results can be directly generalized to the MMRM-VCI and IMMRM estimators. However, whether their statement extends to stratified randomization or misspecified working models remains an open question.
- Bugni et al. (2018) Bugni, F. A., Canay, I. A., and Shaikh, A. M. (2018). Inference under covariate-adaptive randomization. Journal of the American Statistical Association, 113(524):1784–1796.
- Bugni et al. (2019) Bugni, F. A., Canay, I. A., and Shaikh, A. M. (2019). Inference under covariate-adaptive randomization with multiple treatments. Quantitative Economics, 10(4):1747–1785.
Cnaan et al. (1997)
Cnaan, A., Laird, N. M., and Slasor, P. (1997).
Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data.Statistics in Medicine, 16(20):2349–2380.
- Davies et al. (2016) Davies, M., Russell-Jones, D., Selam, J.-L., Bailey, T., Kerényi, Z., Luo, J., Bue-Valleskey, J., Iványi, T., Hartman, M., Jacobson, J., et al. (2016). Basal insulin peglispro versus insulin glargine in insulin-naïve type 2 diabetes: Imagine 2 randomized trial. Diabetes, Obesity and Metabolism, 18(11):1055–1064.
- FDA (2021) FDA (2021). Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products Guidance for Industry. https://www.fda.gov/media/148910/download.
- Fisher et al. (1937) Fisher, R. A. et al. (1937). The design of experiments. The design of experiments.
- Frias et al. (2018) Frias, J. P., Nauck, M. A., Van, J., Kutner, M. E., Cui, X., Benson, C., Urva, S., Gimeno, R. E., Milicevic, Z., Robins, D., et al. (2018). Efficacy and safety of ly3298176, a novel dual gip and glp-1 receptor agonist, in patients with type 2 diabetes: a randomised, placebo-controlled and active comparator-controlled phase 2 trial. The Lancet, 392(10160):2180–2193.
- Galbraith and Marschner (2003) Galbraith, S. and Marschner, I. C. (2003). Interim analysis of continuous long-term endpoints in clinical trials with longitudinal outcomes. Statistics in Medicine, 22(11):1787–1805.
- Gosho and Maruo (2018) Gosho, M. and Maruo, K. (2018). Effect of heteroscedasticity between treatment groups on mixed-effects models for repeated measures. Pharmaceutical statistics, 17(5):578–592.
- Hampson and Jennison (2013) Hampson, L. V. and Jennison, C. (2013). Group sequential tests for delayed responses (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1):3–54.
- Hirose et al. (2018) Hirose, T., Cai, Z., Yeo, K. P., Imori, M., Ohwaki, K., and Imaoka, T. (2018). Open-label, randomized study comparing basal insulin peglispro and insulin glargine, in combination with oral antihyperglycemic medications, in insulin-naïve asian patients with type 2 diabetes. Journal of diabetes investigation, 9(1):100–107.
- Jiang et al. (2018) Jiang, F., Tian, L., Fu, H., Hasegawa, T., and Wei, L. J. (2018). Robust alternatives to ANCOVA for estimating the treatment effect via a randomized comparative study. Journal of the American Statistical Association, 0:1–37.
- Kunz et al. (2015) Kunz, C. U., Friede, T., Parsons, N., Todd, S., and Stallard, N. (2015). A comparison of methods for treatment selection in seamless phase II/III clinical trials incorporating information on short-term endpoints. Journal of biopharmaceutical statistics, 25(1):170–189.
- Landau et al. (2018) Landau, S., Emsley, R., and Dunn, G. (2018). Beyond total treatment effects in randomised controlled trials: Baseline measurement of intermediate outcomes needed to reduce confounding in mediation investigations. Clinical Trials, 15(3):247–256.
- Lane et al. (2017) Lane, W., Bailey, T. S., Gerety, G., Gumprecht, J., Philis-Tsimikas, A., Hansen, C. T., Nielsen, T. S., Warren, M., et al. (2017). Effect of insulin degludec vs insulin glargine u100 on hypoglycemia in patients with type 1 diabetes: the SWITCH 1 randomized clinical trial. Journal of the American Medical Association, 318(1):33–44.
- Liang and Zeger (2000) Liang, K.-Y. and Zeger, S. L. (2000). Longitudinal data analysis of continuous and discrete responses for pre-post designs. Sankhyā: The Indian Journal of Statistics, Series B, pages 134–148.
- Lin et al. (2015) Lin, Y., Zhu, M., and Su, Z. (2015). The pursuit of balance: An overview of covariate-adaptive randomization techniques in clinical trials. Contemporary Clinical Trials, 45:21 – 25. 10th Anniversary Special Issue.
- Lu and Tsiatis (2011) Lu, X. and Tsiatis, A. A. (2011). Semiparametric estimation of treatment effect with time-lagged response in the presence of informative censoring. Lifetime Data Analysis, 17(4):566–593.
- Mallinckrod et al. (2008) Mallinckrod, C. H., Lane, P. W., Schnell, D., Peng, Y., and Mancuso, J. P. (2008). Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials. Drug Information Journal, 42(4):303–319.
- Marschner and Becker (2001) Marschner, I. C. and Becker, S. L. (2001). Interim monitoring of clinical trials based on long-term binary endpoints. Statistics in Medicine, 20(2):177–192.
- Moore and van der Laan (2009a) Moore, K. and van der Laan, M. (2009a). Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation. Statistics in Medicine, 28(1):39–64.
- Moore and van der Laan (2009b) Moore, K. L. and van der Laan, M. J. (2009b). Increasing power in randomized trials with right censored outcomes through covariate adjustment. Journal of Biopharmaceutical Statistics, 19(6):1099–1131. PMID: 20183467.
Neyman et al. (1990)
Neyman, J. S., Dabrowska, D. M., and Speed, T. (1990).
On the application of probability theory to agricultural experiments. Essay on principles. Section 9.Statistical Science, pages 465–472.
- Qian et al. (2019) Qian, T., Rosenblum, M., and Qiu, H. (2019). Improving precision through adjustment for prognostic variables in group sequential trial designs: Impact of baseline variables, short-term outcomes, and treatment effect heterogeneity. arXiv preprint arXiv:1910.05800.
- Qu and Luo (2015) Qu, Y. and Luo, J. (2015). Estimation of group means when adjusting for covariates in generalized linear models. Pharmaceutical statistics, 14(1):56–62.
- Rubin and van der Laan (2008) Rubin, D. and van der Laan, M. (2008). Covariate adjustment for the intention-to-treat parameter with empirical efficiency maximization. U.C. Berkeley Division of Biostatistics Working Paper Series., Working Paper 229:https://biostats.bepress.com/ucbbiostat/paper229.
- Schuler (2021) Schuler, A. (2021). Mixed models for repeated measures should include time-by-covariate interactions to assure power gains and robustness against dropout bias relative to complete-case ANCOVA. arXiv preprint arXiv:2108.06621.
- Seuc et al. (2013) Seuc, A. H., Peregoudov, A., Betran, A. P., and Gulmezoglu, A. M. (2013). Intermediate outcomes in randomized clinical trials: an introduction. Trials, 14(1):78.
- Shih and Quan (1999) Shih, W. J. and Quan, H. (1999). Planning and analysis of repeated measures at key time-points in clinical trials sponsored by pharmaceutical companies. Statistics in Medicine, 18(8):961–973.
- Sorli et al. (2017) Sorli, C., Harashima, S.-i., Tsoukas, G. M., Unger, J., Karsbøl, J. D., Hansen, T., and Bain, S. C. (2017). Efficacy and safety of once-weekly semaglutide monotherapy versus placebo in patients with type 2 diabetes (sustain 1): a double-blind, randomised, placebo-controlled, parallel-group, multinational, multicentre phase 3a trial. The Lancet Diabetes & endocrinology, 5(4):251–260.
- Stallard (2010) Stallard, N. (2010). A confirmatory seamless phase ii/iii clinical trial design incorporating short-term endpoint information. Statistics in Medicine, 29(9):959–971.
- Tsiatis (2007) Tsiatis, A. (2007). Semiparametric theory and missing data. Springer Science & Business Media.
- van der Laan and Gruber (2012) van der Laan, M. J. and Gruber, S. (2012). Targeted minimum loss based estimation of causal effects of multiple time point interventions. The international journal of biostatistics, 8(1).
- van der Vaart (1998) van der Vaart, A. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.
- Van Lancker et al. (2020) Van Lancker, K., Vandebosch, A., and Vansteelandt, S. (2020). Improving interim decisions in randomized trials by exploiting information on short-term endpoints and prognostic baseline covariates. Pharmaceutical statistics, 19(5):583–601.
- Wang et al. (2019a) Wang, B., Ogburn, E. L., and Rosenblum, M. (2019a). Analysis of covariance in randomized trials: More precision and valid confidence intervals, without model assumptions. Biometrics, 75(4):1391–1400.
- Wang et al. (2019b) Wang, B., Susukida, R., Mojtabai, R., Amin-Esmaeili, M., and Rosenblum, M. (2019b). Model-robust inference for clinical trials that improve precision by stratified randomization and covariate adjustment. arXiv preprint arXiv:1910.13954.
- Yang and Tsiatis (2001) Yang, L. and Tsiatis, A. (2001). Efficiency study of estimators for a treatment effect in a pretest-posttest trial. The American Statistician, 55(4):314–321.
- Ye et al. (2021) Ye, T., Shao, J., Yao, Y., and Zhao, Q. (2021). Toward better practice of covariate adjustment in analyzing randomized clinical trials. arXiv preprint arXiv:2009.11828v2.
- Ye et al. (2020) Ye, T., Yi, Y., and Shao, J. (2020). Inference on average treatment effect under minimization and other covariate-adaptive randomization methods. arXiv preprint arXiv:2007.09576.
- Zelen (1974) Zelen, M. (1974). The randomization and stratification of patients to clinical trials. Journal of Chronic Diseases, 27(7):365 – 375.
- Zhang (2015) Zhang, M. (2015). Robust methods to improve efficiency and reduce bias in estimating survival curves in randomized clinical trials. Lifetime Data Analysis, 21(1):119–137.
- Zhou et al. (2018) Zhou, M., Tang, Q., Lang, L., Xing, J., and Tatsuoka, K. (2018). Predictive probability methods for interim monitoring in clinical trials with longitudinal outcomes. Statistics in Medicine, 37(14):2187–2207.