## 1 Introduction

Adaptive designs have been developed and have been available for use in clinical trials for decades. The U.S. Food and Drug Administration defines an adaptive design as “…a clinical study design that allows for prospectively planned modifications based on accumulating study data without undermining the study’s integrity and validity” (FDA, 2016).

The existing literature on adaptive designs has thus far considered several types of prospectively planned design modifications, including blinded sample size reassessment, group sequential testing, interim analysis for benefit or futility, successive re-randomization, changing subgroup proportions or eligibility criteria of the trial (Rosenblum and van der Laan, 2011) and dropping treatment arms. Prominent among the techniques developed to preserve the validity of statistical inference when design adaption has occurred is the conditional error function (Proschan and Hunsberger, 1995; Müller and Schäfer, 2001, 2004), and combination functions have been used to aggregate -values from multiple stages (Bauer and Kohne, 1994; Brannath et al., 2002). See Kairalla et al. (2012); Bauer et al. (2016) for recent comprehensive reviews of adaptive designs in clinical trials. In addition to valid testing, methods have been developed for estimation in an adaptive group sequential design (e.g. Gao et al., 2013).

The present work is motivated by large-scale public health intervention studies of complex multi-component intervention packages. In the newly proposed “learn-as-you-go” (LAGO) design, the intervention, which can e.g. be a treatment, a device, a new way to organize care, or, more likely, a combination thereof, is composed of several components. While subject matter experts have some knowledge with regard to the preferred intervention package, in LAGO, optimal development of the intervention package is an inherent part of the study goals. A LAGO study is conducted in stages. After each stage, the data collected so far are analyzed, the intervention package is reassessed, and a revised intervention package is rolled out in the next stage. Unlike previous adaptive designs, in the LAGO design, the composition of the intervention package in later stages depends on the outcomes from previous stages.

The Sequential Multiple Assignment Randomized Trial (SMART) design (Murphy, 2005; Murphy et al., 2007) randomizes study participants at more than one time point to pre-specified randomization options with probabilities that depend on participant’s past characteristics and outcomes. The aim of a SMART trial is to estimate the optimal sequence of treatments given patient’s covariate and response histories up to the present. It is a non-adaptive design method which optimizes a personalized and dynamic intervention, in part by restricting randomization options at each step. In contrast, LAGO identifies a complex static, possibly ‘cluster-personalized’, intervention package where, unlike in SMART, the options are unknown at the start of the trial and are estimated anew as a result of trial data up to the current stage. In addition, LAGO studies will add new centers, with new participants, entering at each stage, while in SMART the same individuals are repeatedly re-randomized. Both design strategies are well suited for complex multi-component interventions.

The multiphase optimization strategy (MOST, Collins et al., 2007, 2014) consists of three phases: preparation, optimization and evaluation. The optimal intervention package is developed during the optimization phase, followed by its formal statistical evaluation in a randomized controlled trial. The aim of MOST is similar to LAGO: to develop an optimal intervention package and estimate its impact. However, in MOST, the outcomes of the past are used at most in one stage, to determine the optimal package in the optimization phase. The resulting package is then independently studied through a controlled trial in the evaluation phase, using no prior data.

At face value, phase I dose-finding studies have perhaps the greatest similarity to the LAGO design paradigm. In dose-finding studies, the goal is to find the maximum tolerated dose, that is, the highest dose of a drug such that adverse effects of the drug are below a pre-determined threshold. Dose values are assigned to patients in a sequential manner, and in each step a decision is made to stop and declare that the maximum tolerated dose has been found, or to continue, and if so, with which dose. The more traditionally used methods include the “3 + 3” and “accelerated titration” designs (Simon et al., 1997; Wong et al., 2016). Another popular method is the continual reassessment method (O’Quigley et al., 1990; O’Quigley and Shen, 1996), which assigns each patient the current estimated maximum tolerated dose. Methods were also developed for the optimal dose of two drugs simultaneously (Thall et al., 2003; Wang and Ivanova, 2005). Rosenberger and Haines (2002) provide a review of the continual reassessment method and additional statistical methods for dose finding studies. Dose-finding studies are generally too small for the application of asymptotic statistical methods, and typically Bayesian approaches have been used. In contrast, in public health intervention studies, the magnitude of the per-stage sample size is typically much larger than the sample size in dose-finding studies, while the maximum number of stages will be limited. Additionally, unlike dose-finding studies, where methods are considered for a single or at most dual treatments, the complex public health interventions motivating the development of the LAGO design feature multiple components, some of which are continuous, while others are binary (Hallberg and Richards, 2015).

An ad hoc example of a precursor to a formal LAGO study is the “BetterBirth Study” (Hirschhorn et al., 2015; Semrau et al., 2017) of Ariadne Labs, led by Atul Gawande (Gawande, 2014), a joint center of the Brigham and Women’s Hospital and the Harvard T.H. Chan School of Public Health. The BetterBirth Study assessed the use of the World Health Organization’s (WHO) Safe ChildBirth checklist, a 31-item checklist of best labor and delivery practices believed to be feasible in resource-limited settings, to reduce maternal and neonatal mortality. The intervention was adapted and tested in a three phase process in Uttar Pradesh, India, where neonatal mortality is 32 per 1000 live births and maternal mortality is 258 per 100,000 births (Semrau et al., 2017). During the first two phases, the intervention was adapted, and a final version was tested in a cluster randomized trial. This work was generously funded through a grant from the Bill & Melinda Gates Foundation, and included 157,689 mothers and newborns.

The first goal of a LAGO study is to identify the optimal intervention package such that the cost of the intervention is minimized and the probability of a desired binary outcome is above a given threshold. For example, in the BetterBirth Study, the outcome could be the use of the WHO Safe ChildBirth checklist, with the aim being, for example, that the checklist is used during at least 90% of the births. In the illustrative example included in this paper, we investigate a process outcome, oxytocin administration after delivery, with the aim being that 85% of mothers will receive oxytocin after delivery, as recommended by the WHO, as a proven intervention for preventing postpartum hemorrhage. We determine whether the use of a multiple component intervention package that includes on-site coaching visits and an intervention launch of a particular duration, increases the administration of oxytocin, compared to standard of care.

The second goal of a LAGO study is to assess the overall impact of the intervention strategy, as well as that of its individual components. We present methodology to achieve both goals.

In a LAGO study, the data are not an independent sample. Beginning with the second stage, the recommended intervention package is itself a random variable that depends on previous outcomes. In the final analysis, a LAGO study uses the data from all stages. When considering the asymptotic behavior of the estimators, we assume that the sample size in each stage increases at a similar rate. In addition, we assume that the intervention in each stage converges in probability to a constant as the number of observations in the previous stage goes to infinity. This would happen, under the usual regularity conditions, if the intervention in each stage is based on a maximum likelihood estimator obtained from the data collected in previous stages.

LAGO studies can be further characterized by a key design feature which determines the strength of the causal inferences that can be made. In an uncontrolled LAGO study, there are neither baseline data available to permit a quasi-experimented before-after comparison nor randomized or non-randomized planned variation in the implementation of the intervention package. Thus, unplanned variation, which is widespread in large-scale public health interventions, serves as the basis for making causal contrasts. Causal inference methods will thus be needed to adjust for possible confounding bias (Hernan and Robins, 2019; Spiegelman and Zhou, 2018). In a controlled LAGO study, baseline outcome data are collected before the intervention is implemented, or in additional centers in which no intervention was implemented. These additional centers may be randomized or not, to be included in the study as controls. In the design where baseline data serves as the control, the quasi-experimental before-after design serves as the basis for causal contrasts. If, instead or in addition, there are concurrent control centers, stronger causal inference is permitted by design, with the strongest design being a randomized controlled before-after set-up.

We propose estimators for a LAGO study allowing for several stages, multiple centers or sites, multiple component complex interventions, and center-specific baseline covariates that affect the outcome rate, or random center-specific deviations from the recommended intervention, or both. We show that even in this setup, the optimal intervention can be learned from the combined data from all stages. Even when the optimal intervention in the last stage does not achieve the pre-specified study goal, the optimal intervention is estimated. We prove consistency and asymptotic normality of the new estimators utilizing a novel coupling argument. We further establish the validity of tests for an overall intervention effect. In addition, we develop a confidence set for the optimal intervention package and confidence bands for the target outcome probability under various observed or hypothesized intervention packages.

## 2 LAGO design - theoretical development

### 2.1 Description of the learn-as-you-go design

The methods we develop in this paper cover an arbitrary number of stages, . At each stage , a version of the intervention package is implemented in each of centers. Let denote the sample size (e.g. the number of births) in the -th center at stage . We assume that each center is included in one stage only. In a randomized controlled trial, centers may be randomized to either intervention or control. Alternatively, data might be collected pre and post the implementation of the intervention package and then a center contributes data to both the intervention and the control.

Asymptotic theory is developed for the setting where the number of patients per center goes to infinity at the same rate in all stages, leading to reliable approximations when the number patients in each center is relatively large. Let be the number of participants in stage and be the total number of participants. Our asymptotic inference assumes that the ratio between the number of patients in each center and the total sample size converges to a constant, and we write ; then, . Define also . Proofs are given in Sections 1 and 2 of the supplementary materials. For ease of presentation, we first develop methodology for a LAGO study consisting of two stages. Section 3 of the supplementary materials covers studies with more than two stages.

The multivariate intervention package consists of components. Let be the support of the intervention, that is, all possible intervention values. For example, if all intervention components are continuous and each is constrained to be within a given interval , then . Throughout this paper, as would ordinarily be the case in practice, we assume that is bounded.

For stage 1, an initial (or for each center ) is chosen by the investigators, based on their best judgment. We distinguish between the recommended intervention and the actual intervention. In large scale public health settings, the actual intervention, denoted by , may differ from the recommended intervention, due to local constraints or preferences. We denote for center-specific characteristics reflecting baseline heterogeneity between centers with respect to the outcome of interest and we consider them fixed, i.e., they are not part of the intervention package. For each center, could be, for example, the district of the health center or its monthly birth volume.

We assume that the probability of success for a single unit (e.g., participant or birth) in a center with characteristics under intervention , , does not depend on the recommended intervention , except through the actual intervention

, and follows a logistic regression model

(1) |

where

is a vector of unknown parameters, such that

describes the effects of the intervention package components. For centers in the control arm or for pre-intervention data, if available, . We assume that in each stage, conditionally on all and , outcomes are independent within and between centers. Learning the intervention, however, causes dependence between stages, which we consider below.A main goal of the LAGO design is to identify the optimal intervention package. Let be a pre-specified outcome probability goal and be a known cost function. For example, in the BetterBirth Study, one may want to find the minimal number of on-site coaching visits to ensure that oxytocin is administrated to the mother right after delivery in at least 85% of births (). If were known, an optimal intervention for a center with covariates could be the solution to the center-specific optimization problem

(2) |

Computational issues regarding solving (2) will be discussed in Section 2.5. We assume that for the true parameter values, there is a unique solution to (2). For example, if the intervention has two components with unit costs and and a linear cost function, we assume that . Alternatively, other optimization criterion can be considered. For example, the optimal intervention could require that the intervention results in an outcome probability when calculating a weighed average over a group of centers , with sample sizes . That is,

where . In this paper we focus on (2).

We continue our description of the data and model. Let be the observed center characteristics in each of the stage centers. We start with stage 1. Let be the recommended (multivariate) intervention package for center in stage 1, which in the absence of , may be the same for all centers. We assume that the stage 1 recommended interventions , , are determined before the trial starts. The actual intervention in center of stage 1 is, however, , where is a deterministic center-specific continuous function from to that determines how center implements the actual intervention based on the recommendation . We do not require that the are known, but only that the are observed. Let be the binary outcome of interest for patient in center of stage 1, each following model (1), and let the outcome vector in center of stage 1 be . Let and be the stage 1 actual interventions and outcomes, respectively.

Following the stage 1 data collection, a stage 1 analysis is conducted to determine the recommended interventions for the new centers in stage 2, denoted by . If there are control centers, their recommended intervention and their actual intervention are zero. The value is chosen through a function, , that takes as input the stage 1 data, the goal of the intervention, and the center-specific covariates and returns a recommended intervention, which is usually the estimated optimal intervention . Then, can be obtained by solving the optimization problem given in (2) for each center, with replaced by an estimator based on the stage 1 data alone. The superscript, , in reminds us that is a random variable that is a function of the data from the participants in stage 1.

The actual intervention implemented in center of stage 2 is , where are the analogues of , but now for the stage 2 centers. Let be the recommended interventions at the stage 2 centers. Once are determined, stage 2 outcomes are collected under the actual interventions , which may be the same as . Let be the stage 2 outcomes in center , each following model (1), and be all the stage 2 outcomes. Our two main assumptions are

###### Assumption 1

Conditionally on , are independent of the stage 1 data .

###### Assumption 2

For each , the stage 2 recommended intervention converges in probability to a center-specific limit .

Assumption 1 assumes that learning takes place only through the determination of the recommended intervention. It ensures that the dependence between the stage 1 data and stage 2 outcomes is solely due to the dependence of the on the stage 1 data. It specifically means that, given , the actual intervention in a stage 2 center is conditionally independent of . Under Assumption 1, and the aforementioned assumption that conditionally on the actual interventions, the outcomes do not depend on the recommended interventions, we can conclude that in stage 2, , so the logistic regression model (1) holds for the stage 2 data. Assumption 2 implies that in the presence of more and more stage 1 data under , each of the estimated optimal intervention packages , converges in probability to a fixed value . For example, Assumption 2 will hold if are continuous functions of the stage 1 maximum likelihood estimator, , as is the case if solves (2) and . Under Assumption 2 and continuity of the ’s, the Continuous Mapping Theorem implies that converges in probability to .

In fact, the results we prove in this paper regarding the estimators obtained at the end of the study hold not only for , but under any choice of function for the recommended intervention, as long as Assumption 2 holds. Further details about and proofs of this claim are given in Section 2 of the supplementary materials.

### 2.2 and its asymptotic properties

We estimate after the stages are concluded. As in previous sections, for ease of development, we consider here . Section 3 of the supplementary materials covers the case of .

We propose to estimate by solving the estimating equations

(3) |

In Section 2.1 of the supplementary materials, we show that the estimator that solves (3) is also a maximum partial likelihood estimator, although it is not needed for the proofs below. These estimating equations (3) also arise if the interventions were determined a priori, so can be estimated using standard software.

Asymptotic theory for is complicated, however, by the fact that and are not independent. Thus, the score function, , is not a sum of independent random variables.

Let be the parameter space for . A conditional expectations argument (Equation (A.9) in the supplementary materials) shows that the score function has mean zero when evaluated at the true value, denoted by . Furthermore, we show in the supplementary materials (Equation (A.10)) that the two terms in (3), although dependent, are uncorrelated. These two properties are useful for proving that is consistent:

The proof is given in Section 2.2 of the supplementary materials.

Asymptotic normality also poses a challenge due to the dependence between the two summands in . It can be shown that converges in probability to , for all , with given in Section 2.3 of the supplementary materials. The following theorem establishes asymptotic normality of :

The full proof of Theorem 2 is given in Section 2.3 of the supplementary materials. Here we outline the main parts of the proof, which rests upon a novel coupling argument. First, by the mean value theorem and further arguments, it can be shown that the asymptotic distribution of is the same as the asymptotic distribution of

(5) |

We next show that the asymptotic distribution of the part of (5) that does not involve is multivariate normal. The following coupling argument deals with the fact that the two summands in (5) are not independent. For each , let , , be independent Bernoulli random variables, independent of all stage 1 data, with success probability , where, as defined before, . We construct variables which, given the stage 1 data and the , have the same distribution as the original , but coupled (see e.g. Lindvall (2002)) with the in the following way. Let be independent uniform random variables, independent of all other variables introduced so far. For the case , let

A similar expression is given in the supplementary materials for the case . The key property of the coupling argument is that given and the stage 1 data, the distribution of the coupled is identical to the distribution of the original . Therefore, when we replace with in (5), the distribution of (5) is unaffected. In the supplementary materials, we use the coupled outcomes to show that the part of (5) that does not involve has the same asymptotic distribution as

(6) |

The outcomes and are independent, because the are the outcomes under the constant intervention . Therefore, by standard logistic regression theory, the expression in (6

) converges in distribution to a normal random variable with mean zero and variance

. Combining the asymptotic normality of (6) with (5) implies that Theorem 2 holds.The asymptotic variance can be consistently estimated from the data by replacing , , and with , , and , respectively, in . The asymptotic variance and its approximation are the same as if the interventions were fixed in advance and and were independent.

### 2.3 Hypothesis testing

A major goal of a LAGO study is to test the null hypothesis of no overall intervention effect. One way to test this is to carry out a test for the subvector of

characterizing the effect of the intervention. That is, to test in model (1) using the asymptotic normality result of Section 2.2. Because of this asymptotic normality result, the Wald or likelihood ratio tests for are asymptotically valid for any constant .Alternatively, in a controlled LAGO design, let be a group indicator that equals one for the intervention group and zero for the control, and let and be the success probabilities under and , respectively. Then, an alternative test for an overall intervention effect, , can be carried out by testing . The latter test is valid despite the adaption of the intervention package. By Assumption 1, the dependence between the stage 2 and stage 1 data is solely due to the stage 1 data determining the stage 2 recommended intervention, which, in turn, affects the actual stage 2 intervention, and thus the stage 2 outcomes. However, under the null, there is no effect of the actual intervention on the stage 2 outcomes. Therefore, under the null, regardless of the way the intervention was adapted, the stage 1 and stage 2 outcomes are independent. Thus, a standard test for equal probabilities in the control and the intervention arms is valid. While not needed due to our asymptotic results, the same arguments could have been used for the standard tests of .

In a controlled LAGO design, an alternative, possibly more powerful, test for the overall effect of the intervention in the presence of center characteristics is to consider in the model . As before, in light of the between-stages independence under the null, in model (1) implies .

### 2.4 Confidence sets and confidence bands

After the conclusion of the study, the optimal intervention is estimated as the solution to (2) with replaced by . To obtain an asymptotic 95% confidence set for the optimal intervention

, we first obtain a confidence interval for

, for a given and for each . To do this, we calculate a 95% confidence interval for , i.e., for :where is the estimated variance of , and is the estimated variance of . The 95% confidence interval for is . Then, we obtain the confidence set for the optimal intervention as . That is, includes intervention packages for which is inside the confidence interval for the success probability under those interventions.

We now show that the confidence set contains with the specified probability of 0.95. Recall that under the assumption that can be achieved, . Therefore,

Implementing this procedure is simple and its calculation is fast. Because calculating does not depend upon estimating , it does not involve the optimization algorithm.

At the end of the study, researchers might be interested in a variety of potential intervention packages in that were not necessarily identified as of interest a priori. We propose a method to develop confidence bands for the outcome probabilities for a range of of interest, simultaneously. These confidence bands allow researchers to study the entire intervention space when comparing potential choices of the intervention package. We propose a procedure that is based on the asymptotic normality of and on Scheffé’s method (Scheffé, 1959). First, for all , construct to obtain 95% confidence bands for ,

with defined as before and

the 95% quantile of a

distribution. As before, we transform into confidence bands for by setting . These confidence bands guarantee asymptotic simultaneous 95% coverage for all possible intervention package compositions; the proof is given in Section 4 of the supplementary materials.### 2.5 Computation of the optimal intervention

The algorithm used to solve (2) after stage , using , depends on the form of . Under a linear cost function with unit costs for the –th component of the intervention, the solution is achieved by 1. setting all components to their minimal value , 2. ordering the components by their estimated cost-efficiency , and 3. increasing the most cost-efficient component until either is achieved or until this component reaches its maximal value, and then moving to the next most cost-efficient component among the remaining components. For non-linear cost functions, standard non-linear optimization algorithms can be used.

## 3 Simulations

We conducted simulation studies to investigate the finite sample properties of our methods. We simulated 1000 data sets per simulation scenario. We considered a two-stage controlled LAGO design with equal number of centers per stage , with half the centers in the intervention arm and half in the control arm. The total sample size available at the end of the study is . We considered the values , , and . The intervention had two components, , with unit costs and . The minimum and maximum values of and were and . We considered the following values for : (the null), , , , and . A single center covariate

was normally distributed with mean 0 and variance 1 and its coefficient was taken to be

. For simplicity, we did not include an intercept in model (1), although each center had its own baseline success probability due to . For , the probability of success in the control arm was 0.5. The stage 2 recommended intervention was based on solving the optimization problem (2) using the stage 1 estimates of . Section 5.1 of the supplementary materials provides the details on what was done when no solution existed for which was reached.Selected results are presented in Tables 1 and 2. Table 1 presents results on the performance of , and shows that for

, the finite sample bias was minimal, the mean estimated standard error was very close to the empirical standard deviation, and the empirical coverage rate of the confidence intervals for the effects of the individual package components was very close to 95%. Moreover, Section 5.2 of the supplementary shows that the type I error rate of the tests discussed in Section

2.3 was close to 0.05.%RelBias | SE/EMP.SD | CP95 | %RelBias | SE/EMP.SD | CP95 | ||||
---|---|---|---|---|---|---|---|---|---|

50 | 100 | 6 | -1.1 | 92.0 | 95.2 | -2.1 | 83.3 | 94.2 | |

10 | -3.0 | 100.1 | 95.6 | -0.8 | 93.4 | 94.9 | |||

20 | 0.1 | 103.5 | 95.5 | -0.6 | 104.9 | 96.1 | |||

200 | 6 | -3.0 | 88.4 | 94.9 | -3.1 | 83.5 | 95.2 | ||

10 | -6.6 | 92.9 | 94.5 | -0.9 | 93.5 | 94.9 | |||

20 | 0.2 | 102.5 | 95.6 | -0.6 | 97.7 | 95.3 | |||

100 | 100 | 6 | -0.8 | 89.5 | 95.1 | -1.6 | 86.7 | 95.2 | |

10 | 3.5 | 102.2 | 95.7 | -1.3 | 102.2 | 95.0 | |||

20 | 1.5 | 100.7 | 95.3 | -0.4 | 101.1 | 95.2 | |||

200 | 6 | -2.2 | 90.4 | 94.6 | -1.4 | 89.7 | 96.0 | ||

10 | -0.8 | 102.7 | 96.7 | -0.7 | 95.9 | 95.5 | |||

20 | -0.3 | 97.4 | 94.7 | -0.4 | 96.7 | 94.1 | |||

50 | 100 | 6 | -11.4 | 89.0 | 94.8 | -0.4 | 82.2 | 96.1 | |

10 | -7.3 | 103.7 | 95.7 | 0.4 | 104.4 | 96.5 | |||

20 | -3.1 | 99.0 | 94.7 | -0.1 | 100.8 | 95.0 | |||

200 | 6 | -15.8 | 92.6 | 95.0 | 1.4 | 89.7 | 94.9 | ||

10 | -8.1 | 93.3 | 95.7 | 0.3 | 99.6 | 95.5 | |||

20 | -1.8 | 100.1 | 95.3 | -0.5 | 102.5 | 96.6 | |||

100 | 100 | 6 | -6.0 | 96.2 | 96.3 | 0.0 | 94.0 | 95.2 | |

10 | -2.7 | 98.2 | 95.1 | -0.2 | 104.7 | 95.4 | |||

20 | -2.7 | 100.7 | 95.2 | 0.2 | 102.2 | 95.2 | |||

200 | 6 | -8.9 | 95.4 | 95.4 | 0.3 | 83.8 | 96.5 | ||

10 | -5.0 | 95.6 | 94.6 | 0.0 | 97.3 | 95.3 | |||

20 | -3.2 | 98.9 | 94.4 | 0.1 | 104.7 | 95.5 |

%RelBias, percent relative bias ; SE, mean estimated standard error; EMP.SD, empirical standard deviation; CP95, empirical coverage rate of 95% confidence intervals.

Table 2 presents results for the estimated optimal intervention and success probabilities, for and calculated for a typical center with ; results for are presented in Section 5.2 of the supplementary materials. The finite sample bias and the root mean squared errors of the final were generally small and decreased as the number of centers per stage and the sample size increased. The nominal coverage rate of the confidence set for was approximately 95%, with the set typically including between 3 to 14 percent of , as a measure of precision in the scenarios studied. We also compared the cost of the estimated optimal intervention to the cost of the true optimal intervention and found it to be almost the same for the scenarios presented in Table 2; see Section 5.2 in the supplementary materials. Table 2 also shows that the empirical coverage rate of the confidence bands for was very close to 95%.

Bias() | Bias() | RMSE() | SetCP95 | SetPerc% | BandsCP95 | ||||
---|---|---|---|---|---|---|---|---|---|

50 | 100 | 36.4 | -5.0 | 87.3 | 94.8 | 7.6 | 96.9 | ||

500 | 18.6 | -2.4 | 62.0 | 95.2 | 4.1 | 96.8 | |||

100 | 100 | 22.6 | -2.8 | 69.0 | 94.5 | 6.1 | 96.7 | ||

500 | 9.8 | -1.3 | 45.3 | 94.5 | 3.7 | 97.5 | |||

50 | 100 | -8.4 | 2.4 | 48.9 | 94.4 | 13.3 | 96.8 | ||

500 | -1.9 | 0.9 | 25.0 | 94.9 | 7.7 | 95.9 | |||

100 | 100 | -4.4 | 1.3 | 38.4 | 94.6 | 12.3 | 95.5 | ||

500 | -0.6 | 2.2 | 18.4 | 94.8 | 7.1 | 95.5 | |||

50 | 100 | -31.2 | 4.0 | 81.6 | 94.0 | 14.2 | 95.0 | ||

500 | -15.2 | 3.3 | 57.1 | 94.9 | 8.0 | 94.8 | |||

100 | 100 | -21.8 | 2.7 | 68.3 | 95.1 | 12.4 | 95.4 | ||

500 | -9.0 | 2.6 | 44.1 | 94.3 | 7.5 | 95.0 |

RMSE, root of mean squared errors , mean taken over simulation iterations; SetCP95, empirical coverage percentage of confidence set for optimal intervention; SetPerc%, mean percent of covered by the confidence set; BandsCP95, empirical coverage rate of 95% confidence bands for .

## 4 Illustrative example

The BetterBirth Study consisted of three stages. The first two stages were pilot stages used to develop the intervention package. Stage 3 was a randomized controlled trial. The development of the recommended intervention package was conducted qualitatively, as described in Hirschhorn et al. (2015), and the intervention package was adjusted after each pilot stage. The results of the randomized controlled trial were presented and discussed in Semrau et al. (2017). The number of centers with data on oxytocin administration in the first, second, and third stages was 2, 4 and 30, respectively. In the first two stages, data in each center were collected before and after the intervention was implemented. In stage 3, there were 15 centers in the control arm and 15 centers in the intervention arm. In 5 intervention arm centers, outcome data were also collected before the intervention was implemented.

Here, we focus on the binary outcome of oxytocin administration immediately after delivery, as recommended by the WHO (WHO, 2012) to prevent postpartum hemorrhage, a major cause of maternal mortality. The intervention package components were the duration of the on-site intervention launch (in days), the number of coaching visits after the intervention was launched, leadership engagement (non-standardized initial engagement, standardized initial engagement, and standardized initial engagement with follow-up visits) and data feedback (none; ongoing, paper-based; ongoing, app-based). The four components were adapted in a way that resulted in near multicollinearity. Therefore, for illustration purposes, we considered the first two components only, launch duration and number of coaching visits. The launch duration was 3 days in stage 1 and 2 days in stages 2 and 3. Compared to stage 1, the intensity of coaching visits was increased in stage 2, and further increased in stage 3. For illustrative purposes, we truncated the data at 40 coaching visits or less. The baseline center characteristic we included was the approximate monthly birth volume, given that large facilities might be likely to follow WHO recommendations about oxytocin administration more closely, regardless of the intervention package implemented. Other available center characteristics, e.g. number of staff nurses, were highly correlated with the monthly birth volume.

Table 3

provides the estimated effects of the intervention package components after each of the stages, using all available data at that point. The sample size in stage 1 was relatively small, explaining the wide confidence intervals for the odds ratios. The final results imply that both package components had an effect. Tests for the overall effect of the package yielded a highly significant p-value, regardless of the test we used.

Stage 1 | Stages 1-2 | Stages 1-3 | |
---|---|---|---|

OR (CI-OR) | OR (CI-OR) | OR (CI-OR) | |

Intercept | 1.07 (0.00, 280.80) | 0.10 (0.07,0.15) | 0.10 (0.09,0.11) |

Coaching Visits | 7.95 (1.77,73.95) | 1.11 (0.96,1.28) | 1.08 (1.04,1.12) |

(per 3 visits) | |||

Launch Duration | 1.41 (0.76,2.64) | 2.65 (1.95,3.77) | 2.79 (2.41,3.23) |

(days) | |||

Birth Volume | 0.37 (0.00,32.33) | 2.11 (1.93,2.33) | 1.94 (1.84,2.06) |

(monthly, per 100) | |||

OR, estimated odds ratio ; CI-OR, 95% Confidence interval for the odds ratio. In the estimated optimal interventions, the first component is the launch duration (in days) and the second component is the number of coaching visits .

After consulting with the study investigators, we assigned unit costs of $800 per launch day and $170 per coaching visit. In practice, implementation costs may also depend on center size and, if so, could be replaced with .

The estimation of the optimal intervention package with linear cost was conducted as in the simulation study. Assuming that at least 1 launch day and 1 coaching visit are needed, and that a launch duration of more than 5 days or having more than 40 coaching visits is impractical, we estimated the optimal intervention for a center with average birth volume () to be a launch duration of 2.78 days and 1 coaching visit. We also carried out optimization over all possible combinations of discrete values within , which are coaching visits and for duration of intervention launch and obtained the optimal intervention as launch duration of three days with one coaching visit, . The total cost of the estimated optimal intervention package, , was $2570.

We calculated a 95% confidence set for the optimal intervention over the grid of , taking all possible numbers of coaching visits, , and

for intervention launch duration. Out of 360 potential intervention packages, 38 (10.5%) were included in the 95% confidence set. The set included the following combinations: 1.5 days launch duration and 40 coaching visits; 2 days launch durations and 27 or more coaching visits; 2.5 days launch duration and less than 20 coaching visits; and 3 days launch duration and less than 5 coaching visits. The first, second and third quartiles of the cost distribution within

were =$2462, =$4035, and =$6797. We also calculated 95% simultaneous confidence bands for the probability of success under all 360 intervention compositions; plots are shown in Section 6 of the supplementary materials. For the estimated optimal intervention , the obtained interval within the bands for the probability of oxytocin administration was . The mean difference between the top and bottom of the confidence band over all 360 intervention compositions was 0.07.## 5 Discussion

We developed the LAGO design for multiple component intervention studies with a binary outcome, where the intervention package composition is systematically adapted as part of the design. The goals of studies using the LAGO design are to find the optimal intervention package, to test its effect on the outcome of interest, and to estimate its effect as well as the effects of the individual components

The methodology in this paper was developed for scenarios with a stagewise analysis that does not include formal interim hypothesis testing. However, the LAGO design allows for futility stops, since stopping the trial for futility between stages preserves the type I error. The type I error can only decrease from the nominal level when futility stops are included because when stopping for futility, the null is not rejected (Snapinn et al., 2006).

For clear presentation of the design, methods, and theory, we focused on a general yet practical design. Our work opens the way for further research. For example, it would be interesting to develop methods for studies with further dependence because centers contribute data to more than one stage. The results in this paper could also be extended to continuous, count, or survival outcome data. Finally, many design problems arise, in terms of identifying the optimal , and for given settings.

Many large effectiveness and implementation trials fail because current design methodology does not permit adaptation in the face of implementation failure as in, for example, the BetterBirth (Semrau et al., 2017) and the TasP (Iwuji et al., 2017) studies. The LAGO design rigorously formalizes practices in public health research that are presently conducted in an ad hoc manner, with unknown consequences for the validity of the subsequent standard analysis (Escoffery et al., 2018). We expect widespread use of the LAGO design as a result, with potential gain for many randomized clinical trials.

## References

- Bauer et al. (2016) Bauer, P., F. Bretz, V. Dragalin, F. König, and G. Wassmer (2016). Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls. Statistics in medicine 35(3), 325–347.
- Bauer and Kohne (1994) Bauer, P. and K. Kohne (1994). Evaluation of experiments with adaptive interim analyses. Biometrics, 1029–1041.
- Brannath et al. (2002) Brannath, W., M. Posch, and P. Bauer (2002). Recursive combination tests. Journal of the American Statistical Association 97(457), 236–244.
- Collins et al. (2007) Collins, L. M., S. A. Murphy, and V. Strecher (2007). The multiphase optimization strategy (MOST) and the sequential multiple assignment randomized trial (SMART): new methods for more potent ehealth interventions. American journal of preventive medicine 32(5), S112–S118.
- Collins et al. (2014) Collins, L. M., I. Nahum-Shani, and D. Almirall (2014). Optimization of behavioral dynamic treatment regimens based on the sequential, multiple assignment, randomized trial (SMART). Clinical Trials 11(4), 426–434.
- Escoffery et al. (2018) Escoffery, C., E. Lebow-Skelley, H. Udelson, E. A. Böing, R. Wood, M. E. Fernandez, and P. D. Mullen (2018). A scoping study of frameworks for adapting public health evidence-based interventions. Translational behavioral medicine.
- FDA (2016) FDA (2016). Adaptive Designs for Medical Device Clinical Studies: Guidance for Industry and Foodand Drug Administration Staff.
- Gao et al. (2013) Gao, P., L. Liu, and C. Mehta (2013). Exact inference for adaptive group sequential designs. Statistics in medicine 32(23), 3991–4005.
- Gawande (2014) Gawande, A. (2014). Being mortal: medicine and what matters in the end. Metropolitan Books.
- Hallberg and Richards (2015) Hallberg, I. and D. A. Richards (2015). Complex Interventions in Health: An Overview of Research Methods. Routledge.
- Hernan and Robins (2019) Hernan, M. A. and J. M. Robins (2019). Causal inference. CRC Boca Raton, Chapman & Hall/CRC, forthcoming.
- Hirschhorn et al. (2015) Hirschhorn, L. R., K. Semrau, B. Kodkany, R. Churchill, A. Kapoor, J. Spector, S. Ringer, R. Firestone, V. Kumar, and A. Gawande (2015). Learning before leaping: integration of an adaptive study design process prior to initiation of betterbirth, a large-scale randomized controlled trial in uttar pradesh, india. Implementation Science 10(1), 1.
- Iwuji et al. (2017) Iwuji, C. C., J. Orne-Gliemann, J. Larmarange, E. Balestre, R. Thiebaut, F. Tanser, N. Okesola, T. Makowa, J. Dreyer, K. Herbst, et al. (2017). Universal test and treat and the hiv epidemic in rural south africa: a phase 4, open-label, community cluster randomised trial. The Lancet HIV.
- Kairalla et al. (2012) Kairalla, J. A., C. S. Coffey, M. A. Thomann, and K. E. Muller (2012). Adaptive trial designs: a review of barriers and opportunities. Trials 13(1), 145.
- Lindvall (2002) Lindvall, T. (2002). Lectures on the coupling method. Courier Corporation.
- Müller and Schäfer (2001) Müller, H.-H. and H. Schäfer (2001). Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics 57(3), 886–891.
- Müller and Schäfer (2004) Müller, H.-H. and H. Schäfer (2004). A general statistical principle for changing a design any time during the course of a trial. Statistics in medicine 23(16), 2497–2508.
- Murphy (2005) Murphy, S. A. (2005). An experimental design for the development of adaptive treatment strategies. Statistics in medicine 24(10), 1455–1481.
- Murphy et al. (2007) Murphy, S. A., K. G. Lynch, D. Oslin, J. R. McKay, and T. TenHave (2007). Developing adaptive treatment strategies in substance abuse research. Drug and alcohol dependence 88, S24–S30.
- O’Quigley et al. (1990) O’Quigley, J., M. Pepe, and L. Fisher (1990). Continual reassessment method: a practical design for phase 1 clinical trials in cancer. Biometrics, 33–48.
- O’Quigley and Shen (1996) O’Quigley, J. and L. Z. Shen (1996). Continual reassessment method: a likelihood approach. Biometrics, 673–684.
- Proschan and Hunsberger (1995) Proschan, M. A. and S. A. Hunsberger (1995). Designed extension of studies based on conditional power. Biometrics, 1315–1324.
- Rosenberger and Haines (2002) Rosenberger, W. F. and L. M. Haines (2002). Competing designs for phase i clinical trials: a review. Statistics in medicine 21(18), 2757–2770.
- Rosenblum and van der Laan (2011) Rosenblum, M. and M. J. van der Laan (2011). Optimizing randomized trial designs to distinguish which subpopulations benefit from treatment. Biometrika 98(4), 845–860.
- Scheffé (1959) Scheffé, H. (1959). The analysis of variance, Volume 72. John Wiley & Sons.
- Semrau et al. (2017) Semrau, K. E., L. R. Hirschhorn, M. Marx Delaney, V. P. Singh, R. Saurastri, N. Sharma, D. E. Tuller, R. Firestone, S. Lipsitz, N. Dhingra-Kumar, et al. (2017). Outcomes of a coaching-based who safe childbirth checklist program in india. New England Journal of Medicine 377(24), 2313–2324.
- Simon et al. (1997) Simon, R., L. Rubinstein, S. G. Arbuck, M. C. Christian, B. Freidlin, and J. Collins (1997). Accelerated titration designs for phase i clinical trials in oncology. Journal of the National Cancer Institute 89(15), 1138–1147.
- Snapinn et al. (2006) Snapinn, S., M.-G. Chen, Q. Jiang, and T. Koutsoukos (2006). Assessment of futility in clinical trials. Pharmaceutical Statistics 5(4), 273–281.
- Spiegelman and Zhou (2018) Spiegelman, D. and X. Zhou (2018). Evaluating public health interventions: 8. causal inference for time-invariant interventions. American journal of public health 108(9), 1187–1190.
- Thall et al. (2003) Thall, P. F., R. E. Millikan, P. Mueller, and S.-J. Lee (2003). Dose-finding with two agents in phase I oncology trials. Biometrics 59(3), 487–496.
- Wang and Ivanova (2005) Wang, K. and A. Ivanova (2005). Two-dimensional dose finding in discrete dose space. Biometrics 61(1), 217–222.
- WHO (2012) WHO (2012). WHO recommendations for the prevention and treatment of postpartum haemorrhage. World Health Organization.
- Wong et al. (2016) Wong, K. M., A. Capasso, and S. G. Eckhardt (2016). The changing landscape of phase I trials in oncology. Nature Reviews Clinical Oncology 13(2), 106–117.

Comments

There are no comments yet.