An optimization solver for blocked Response-Adaptive Randomized (RAR) trial design
In clinical trials, response-adaptive randomization (RAR) has the appealing ability to assign more subjects to better-performing treatments based on interim results. The traditional RAR strategy alters the randomization ratio on a patient-by-patient basis; this has been heavily criticized for bias due to time-trends. An alternate approach is blocked RAR, which groups patients together in blocks and recomputes the randomization ratio in a block-wise fashion; the final analysis is then stratified by block. However, the typical blocked RAR design divides patients into equal-sized blocks, which is not generally optimal. This paper presents TrialMDP, an algorithm that designs two-armed blocked RAR clinical trials. Our method differs from past approaches in that it optimizes the size and number of blocks as well as their treatment allocations. That is, the algorithm yields a policy that adaptively chooses the size and composition of the next block, based on results seen up to that point in the trial. TrialMDP is related to past works that compute optimal trial designs via dynamic programming. The algorithm maximizes a utility function balancing (i) statistical power, (ii) patient outcomes, and (iii) the number of blocks. We show that it attains significant improvements in utility over a suite of baseline designs, and gives useful control over the tradeoff between statistical power and patient outcomes. It is well suited for small trials that assign high cost to failures. We provide TrialMDP as an R package on GitHub: https://github.com/dpmerrell/TrialMDPREAD FULL TEXT VIEW PDF
An optimization solver for blocked Response-Adaptive Randomized (RAR) trial design
Analyses, experiments, and evaluations for the TrialMDP method
Randomization is a common technique used in clinical trials to eliminate potential bias and confounders in a patient population. Most clinical trials utilize fixed randomization, where the probability of assigning subjects to a treatment group is kept fixed throughout the trial. Response-adaptive randomization (RAR) designs were developed due to the captivating benefit of increasing the probability of assigning patients to more promising treatments, based on the responses of prior patients. A big downside for RAR designs is that the time between treatment and outcome must be short, in order to inform future patients’ randomization.
Traditional RAR designs recompute the randomization ratio on a patient-by-patient basis (Thall and Wathen, 2007), usually after a burn-in period of fixed randomization. However, traditional RAR designs have been widely criticized (Karrison et al., 2003). Traditional RAR designs induce bias due to temporal trends in clinical trials. Temporal trends are especially likely to occur in long duration trials. Patients’ characteristics might be completely different throughout the trial or even at the beginning and end of the trial (Proschan and Evans, 2020). However, standard RAR analyses assume that the sequence of patients who arrive for entry into the trial represents samples drawn at random from two homogeneous populations, with no drift in the probabilities of success (Proschan and Evans, 2020; Chandereng and Chappell, 2020). This assumption is usually violated. For example, in the BATTLE lung cancer elimination trials (Liu and Lee, 2015), more smokers enrolled in the latter part of the trial than at the beginning.
Despite this serious flaw, there is not much literature to address the temporal trend issue in RAR designs. Villar et al. explored the hypothesis testing procedure adjusting for covariates for correcting type-I error inflation and the effect on power in RAR designs with temporal trend effects added to the model for two-armed and multi-armed trials(Villar et al., 2018). Karrison et al. (2003) introduced a stratified group-sequential method with a simple example of altering the randomization ratio to address this issue. Chandereng and Chappell (2019) further examined the operating characteristics of the blocked RAR approach for two treatment arms proposed by Karrison et al. (2003). They concluded that blocked RAR provides a good trade-off between ethically assigning more subjects to the better-performing treatment group and maintaining high statistical power. They also suggested using a small number of blocks since large numbers of blocks have low statistical power. However, Chandereng and Chappell (2019) designed trials with equal-sized blocks, which is not generally optimal.
Other works formulate adaptive trial design as a Multi-Armed Bandit Problem (MABP), employing ideas that are often associated with reinforcement learning—e.g., sequential decision-making and regret minimization. These entail sophisticated algorithms, such as Gittins index computations (Villar et al., 2015, 2015) and dynamic programming (Hardwick and Stout, 1995, 1999, 2002). These works have important limitations. The Gittins index approaches of Villar et al. assume either (i) a fully sequential trial with similar weaknesses to traditional RAR or (ii) a blocked trial with equal-sized blocks. The dynamic programming algorithms of Hardwick and Stout yield allocation rules that (i) are deterministic, (ii) are fully sequential, or (iii) assume a blocked trial with a fixed number of blocks. At the time, Hardwick and Stout’s approaches were also limited by computer speed and memory, which have improved famously over the years.
This paper presents TrialMDP, an algorithm that designs blocked RAR trials. TrialMDP is most closely related to the MABP-based approaches mentioned above. However, it models a blocked RAR trial as a Markov Decision Process (MDP), a generalization of the MABP. It relies on a dynamic programming algorithm, similar to those of Hardwick and Stout. However, our method differs in that it optimizes the size and number of blocks as well as their treatment allocations. That is, the algorithm yields a policy that adaptively chooses the size and composition of the next block, based on results seen up to that point in the trial. The current version of TrialMDP is tailored for two-armed trials with binary outcomes. Future versions may permit a more general class of trials.
Our paper has the following structure. In Section 2, we describe our problem formulation and algorithmic solution. In Section 3, we compare TrialMDP’s designs with other designs that have been widely used in clinical trials. We use our proposed method to redesign a phase II trial in Section 3.2. We discuss TrialMDP’s limitations and potential improvements in Section 4. Our Supplementary Materials include appendices that justify some of our mathematical and algorithmic choices.
In this paper we focus on blocked RAR trials with two arms and binary outcomes. We label the arms and (“treatment” and “control”, respectively) and outcomes 0 and 1 (“failures” and “successes”). A trial has access to some number of available patients, . The trial proceeds in blocks. We require that all results from the current block are observed before the next block begins. Importantly, we allow to adapt as the trial progresses. This gives the trial useful kinds of flexibility. In general, a trial may attain better characteristics if it permits differently-sized blocks.
Let denote the treatments’ success probabilities. We assume a frequentist test is performed at the end of the trial, with the following null and alternative hypotheses :
We focus specifically on the one-sided Cochran-Mantel-Haenzsel (CMH) test, which is well-suited for stratified observations; in our setting, the strata are blocks of patients. It has been argued that blocked RAR trials with CMH tests are more robust to temporal trend effects than, e.g., traditional RAR trials with chi-square tests (Chandereng and Chappell, 2019).
(A) Contingency table notation. (B) Trial history notation. A historyis a sequence of cumulative contingency tables, . A subscript indicates a history’s suffix.
We establish some notation for clarity. A contingency table has the following attributes: and , the numbers of patients assigned to each treatment; and , the numbers of successes for each treatment; and , the total numbers of failures and successes; and , the total number of outcomes recorded in the table. Symbols ,
represent point estimates of the treatment success probabilities. See Figure1 for illustration.
Each block of the trial has its own contingency table with corresponding quantities. We use a subscript to indicate the block. For example, the block of the trial has its own table with quantities , , , and so on.
At any point we can summarize the state of the trial in a contingency table, , of cumulative results. That is, contains all of the trial’s observations up to that point; or, put another way, is the sum of all preceding block-wise contingency tables. We typically refer to as a state. We use an underline to indicate a quantity computed from a state. For example, after completing blocks we have quantities ; ; ; and so on.
The sequence of states occupied by a trial forms a trial history , where is always the empty contingency table and always has observations. We use a subscript to denote the suffix of a history. For example, is the sequence of states after the block of the trial. It is useful to think of a history as a random object, subject to uncertainty in the patient outcomes and the values of , .
We aim to design blocked RAR trials that balance (i) statistical power and (ii) patient outcomes. We also recognize that each additional block entails a cost in time and other overhead. As such, we wish to avoid an excessive number of blocks. We formalize these goals with the following utility function:
where is a proxy for the trial’s statistical power; measures the number of failures; and is the number of blocks. This utility function promotes a high statistical power while penalizing failures and blocks. The coefficients and control the relative importance of patient outcomes and blocks, respectively.
The functions , , and have the following forms:
Function simply returns the number of blocks in the trial history. Function quantifies bad patient outcomes (i.e., failures) as a fraction of all patients. It is a function only of the final state, , and becomes small when the estimates and are close.
Function serves as a proxy for the trial’s statistical power. It is crafted such that maximizing also maximizes the power of the Cochran-Mantel-Haenzsel test (Cochran, 1954) to an acceptable approximation. Each is the harmonic mean of that block’s treatment allocations. takes larger values when the allocations are balanced; and when are close to each other, and far from . The factor makes consistent across trials with differing sample sizes. See Appendix A of the Supplementary Materials for a more detailed justification of .
In our effort to maximize the expected utility (Equation 1), we find it useful to model a blocked RAR trial as a Markov Decision Process (MDP). An MDP is a simple model of sequential decision-making. It consists of an agent navigating a state space. At each time-step, the agent chooses an action. Given the agent’s current state and chosen action, the agent transitions to a new state and collects a reward. In general the transition is stochastic, governed by a transition distribution. One solves an MDP by obtaining a policy that maximizes the expected total reward. We refer the reader to Chapter 38 of Lattimore and Szepesvari’s text for detailed information about MDPs (Lattimore and Szepesvari, 2020).
. Its entries are governed by Beta-Binomial distributions, parameterized by the entries of the current contingency table.
We model a blocked RAR trial as an MDP with the following components:
State space. In our setting the state space consists of every possible contingency table with observations. We can order the states by their numbers of observations. We let denote the subset of containing tables with exactly observations. The trial always begins at the empty contingency table in and terminates at some table in . The state space grows quickly with , . Figure 2(A) illustrates for .
Actions. With each block of the trial we choose an action , the block’s size and allocation. Suppose we have completed blocks; then may take any integer value from 1 to . The allocation is the fraction of patients assigned to treatment in this block. We constrain to a finite set of possible values, . For example, . Importantly, exactly patients (rounded to the nearest integer) are assigned to treatment . In other words, patients are randomized to treatments “without replacement.” Contrast this with other randomized designs—traditional RAR, blocked RAR, etc.—that assign each patient to with independent probability . For example, action implies that the next block will treat 60 patients, assigning exactly of them to treatment and to treatment .
We let denote the set of all actions, and denote actions available at state .
Transition distributions. Given the current contingency table and the chosen block design , the next contingency table is randomly distributed. This randomness consists of two parts: (i) the stochasticity of patient outcomes given the true success probabilities and , and (ii) our uncertainty about the values of and . Given the true values for and , the numbers of successes and for this block would have Binomial distributions:
However, we only have imperfect knowledge of and , encoded in the entries of the current table
. We use Beta distributions to describe this uncertainty aboutand :
is a smoothing hyperparameter typically set to 1. Together, these two sources of randomness assign independent Beta-Binomial probabilities toand , which in turn define the distribution for . See Figure 2(B) for illustration. We sometimes use the notation to indicate the transition distribution for , given and .
Rewards. Given the current state and the chosen action , the trial transitions to state and receives a reward . In an MDP the goal is to maximize expected total reward. Recall, however, that our ultimate goal is to maximize the expected utility (Equation 1). We craft a reward function consistent with , as follows:
The total reward for a trial history is identical to the utility (Equation 1) of that trial history. With each block, the reward function produces that block’s contribution to the total utility. This includes the block’s term for ; the block’s cost ; and the final failure penalty when is terminal.
Notice that our particular is a function only of and . We sometimes write for compactness.
Policy. A policy is a function mapping each state in the MDP to an action. In our setting policies are trial designs. For each state in the trial, a policy dictates the design of the trial’s next block: . Our MDP is solved by the optimal policy satisfying
We let denote the corresponding maximal value at each state .
Casting our problem into the MDP framework helps us design algorithmic solutions. Our particular MDP lends itself to a straightforward dynamic programming approach, since there are no cycles in its directed graph of possible transitions.
The MDP described in Section 2.1 can be solved by a relatively simple dynamic programming algorithm. This makes our method a close relative of past dynamic programming approaches for trial design (Woodroofe and Hardwick, 1990; Hardwick and Stout, 1995, 1999, 2002). However, our method differs from them in an important respect: we seek to maximize an objective that is a function of the trial history, and not just a function of the final state. Concretely, our objective function (Equation 1) includes and , which are functions of block-wise attributes. Formulating the problem as an MDP gives us the flexibility to consider such an objective.
Like any dynamic programming algorithm, ours divides the problem at hand into subproblems and solves them in an order that efficiently reuses computation. This dependence between subproblems is defined by a set of recurrence relations. In our case we have a single recurrence based on the Bellman equation (Lattimore and Szepesvari, 2020):
The algorithm computes this recurrence at every state in , iterating through the state space in order of decreasing . In other words the algorithm evaluates the recurrence at each state in , , and so on, until it finally computes for the the empty table and terminates. At each state the algorithm also tabulates the maximizing action . This table of optimal actions is the algorithm’s most important output, as it constitutes , the optimized trial design. Figure 3 illustrates the algorithm in detail with pseudocode.
The trial design (i.e., policy) yielded by this recurrence is guaranteed to maximize the expected utility (subject to the MDP formulation described in Section 2.1), since our optimization problem has the optimal substructure property. See Appendix B of the Supplementary Materials for more discussion and a proof of optimal substructure.
At a high level TrialMDP is a nested loop over every possible state, action, and transition. For each state the algorithm stores a set of values, along with the optimal action. Hence the algorithm uses space. The number of possible actions and transitions varies between states; summing across all states yields total time cost , where is the set of allocation fractions mentioned in Section 2.1.
These complexities apply if we allow the algorithm to consider every possible state and action. However, there are practical ways to prune away states and actions, attaining much lower computational cost without sacrificing much utility. Introducing a minimum block size parameter eliminates all of the states in and ; and reduces the number of possible actions at each remaining state. An additional block increment parameter further constrains the algorithm to states where is an integer multiple of , resulting in a “coarsened” state space. These parameters reduce the algorithm’s space and time cost to and , respectively. See Appendix C of the Supplementary Materials for derivations. We typically set and . Unless specified otherwise, we use . These settings yielded trials with competitive characteristics, without incurring undue computational expense during the evaluations of Section 3.
Empirically, we observe a time cost of 5; ; and seconds for trials with 40, 100, and 140 patients respectively. These measurements used a single-threaded implementation of TrialMDP, on a laptop with Intel 1.1GHz CPUs.
We performed a simulation study to compare TrialMDP against established trial designs. At each point in a grid of values for and , we ran 10,000 simulated trials using TrialMDP and a suite of baseline designs. The baselines included (i) a 1:1, fixed randomization design; (ii) a traditional Response-Adaptive Randomized (RAR) design; and (iii) a blocked RAR design.
For null scenarios with , we chose an arbitrary sample size of . For alternative scenarios with , we chose large enough for a 1:1 design to attain a power of 0.8. See Tables 1 and 2 for the exact values of , , and used in our simulated scenarios.
The traditional RAR baseline used a 1:1 randomization ratio for the first of patients, and adaptive randomization thereafter according to the procedure used by Rosenberger et al. (2001). That is, the patient was assigned to treatment with probability
The blocked RAR baseline used two blocks of equal size. The first block used a 1:1 randomization ratio; the second block used the same randomization given by Equation 4. This agrees with the blocked RAR procedure described by Chandereng and Chappell (2019).
We used TrialMDP to generate trial designs over a grid of parameter settings: . Each parameter setting implies a different balance between statistical power, patient outcomes, and the number of blocks.
We simulated 10,000 trials for every scenario , for each baseline design, and for each TrialMDP parameter setting. As an initial sanity check we visualized some trial histories to see whether the designs behaved as expected. Figure 4 shows some examples. TrialMDP always chose 1:1 allocation for the first block, increasing the allocation to in subsequent blocks when . As increased, the designs reliably increased allocation to , in agreement with our expectations. The baseline designs also yielded trial histories that agreed with our expectations.
Recall that TrialMDP is supposed to optimize the utility function (Equation 1) in expectation. If this were true, we would expect our designs to attain higher utility than the others, averaged over the simulated histories. To verify this we computed the utility for every simulated history and for every design, and tabulated the resulting averages.
Table 1 shows some representative results from the alternative scenarios. These results employed TrialMDP with and . Under these particular parameter settings TrialMDP attained slightly lower power than the other designs, but its superior patient outcomes gave it the greatest utility across all scenarios. Indeed, we found that our algorithm does typically achieve higher average utility than the baseline designs (i) under the alternative hypothesis and (ii) as long as is sufficiently large. When is not large enough, our designs have highest utility among the adaptive designs, but the 1:1 design is mathematically guaranteed to attain highest utility. We show this in Appendix D of the Supplementary Materials.
We highlight the fact that TrialMDP assigned many more patients to the superior treatment on average, in all the scenarios of Table 1. Furthermore, it did so reliably. The 5%-ile for is higher for TrialMDP than for any other adaptive design, in every alternative scenario.
It is also important to note that TrialMDP’s design yielded slightly biased estimates of the effect size in the alternative scenarios. We hypothesize that this bias—on the order of 0.01—stems from the rapidly changing randomization ratio prescribed by TrialMDP. The user ought to weigh this against other matters, such as vastly improved patient outcomes, when considering TrialMDP.
Table 2 shows the corresponding results for null scenarios. Notice that in some cases TrialMDP’s designs showed somewhat inflated type-I error. The percentiles of show that TrialMDP
is more prone to creating an imbalanced allocation under the null hypothesis. Another salient observation is the relative decrease in utility forall of the adaptive designs. This has a simple explanation. Under the null hypothesis, a 1:1 design always has optimal utility. A 1:1 design attains maximal and minimal ; and under the null hypothesis, for any design. Hence, every adaptive design will yield lower utility than the 1:1 design.
|Power (CMH test)||Effect Bias||(5%, 95%)||Utility -score|
|0.3||0.1||94||0.79||0.78||0.79||0.78||0.00||0.00||0.01||16.94||(-4, 36)||11.22||(-8, 30)||19.88||(-2, 50)||3.26||-60.25||6.23||8.40|
|0.4||0.1||46||0.78||0.75||0.77||0.74||0.00||0.00||0.01||6.80||(-6, 18)||3.98||(-8, 16)||15.26||(0, 26)||3.87||-9.18||2.65||8.17|
|0.2||124||0.80||0.78||0.79||0.77||0.00||0.00||0.01||17.03||(-6, 42)||11.18||(-10, 32)||31.77||(-4, 66)||3.50||-104.45||6.10||11.64|
|0.5||0.3||144||0.80||0.80||0.78||0.76||0.00||0.00||0.01||14.20||(-10, 38)||9.54||(-12, 32)||42.23||(-6, 76)||3.41||-157.52||5.11||14.80|
|0.7||0.4||62||0.79||0.77||0.77||0.73||0.00||0.00||0.01||6.96||(-8, 22)||4.60||(-10, 18)||23.03||(0, 34)||3.85||-17.25||2.89||11.22|
|0.5||144||0.81||0.79||0.79||0.77||0.00||0.00||0.01||9.22||(-12, 30)||6.13||(-14, 28)||42.33||(-6, 76)||3.34||-174.98||3.16||14.78|
|0.9||0.6||46||0.79||0.78||0.78||0.76||0.00||0.00||0.01||3.72||(-8, 16)||2.55||(-8, 14)||13.43||(0, 26)||3.50||-9.56||1.62||6.89|
|0.7||94||0.80||0.80||0.80||0.79||0.00||0.00||0.00||4.53||(-12, 20)||3.08||(-14, 20)||19.17||(-2, 50)||3.22||-73.27||1.32||7.80|
|Size (CMH test)||Effect Bias||(5%, 95%)||Utility -score|
|0.1||100||0.05||0.05||0.05||0.05||0.00||0.00||0.00||0.25||(-22, 24)||0.14||(-20, 22)||0.13||(-18, 18)||2.77||-542.40||-5.26||-9.15|
|0.3||100||0.05||0.05||0.05||0.05||0.00||0.00||0.00||-0.18||(-22, 20)||-0.07||(-20, 20)||-0.07||(-32, 32)||3.77||-505.00||-5.14||-13.45|
|0.5||100||0.05||0.05||0.05||0.06||0.00||0.00||0.00||0.05||(-18, 18)||0.12||(-18, 18)||0.06||(-46, 46)||3.88||-490.50||-4.96||-13.36|
|0.6||100||0.05||0.05||0.04||0.05||0.00||0.00||0.00||0.03||(-18, 18)||0.07||(-18, 18)||-0.04||(-42, 42)||3.85||-491.29||-4.98||-13.31|
|0.7||100||0.05||0.05||0.05||0.06||0.00||0.00||0.00||0.11||(-18, 18)||-0.01||(-16, 16)||0.32||(-36, 36)||3.68||-499.43||-5.12||-12.64|
|0.9||100||0.05||0.05||0.05||0.05||0.00||0.00||0.00||-0.10||(-16, 16)||-0.01||(-16, 16)||-0.02||(-18, 18)||2.77||-511.70||-5.05||-9.03|
|Power/Size (CMH test)||Effect Bias||(5%, 95%)||Utility -score|
|0.4||0.4||20||0.06||0.05||0.05||0.05||0.00||0.00||0.00||-0.04||(-8, 8)||-0.03||(-8, 8)||0.01||(-10, 10)||2.57||-45.59||-2.48||-3.57|
|0.8||0.4||20||0.61||0.60||0.54||0.56||0.00||0.00||0.02||2.13||(-6, 10)||1.47||(-6, 8)||5.09||(-6, 10)||2.28||-9.94||0.24||2.50|
Beyond a one-dimensional comparison of utility, it is useful to compare the designs in two dimensions: statistical power and patient outcomes. As we vary the parameter , TrialMDP designs trials that balance these quantities differently. We visualize this with frontier plots; trial designs are shown as points in two dimensions, with statistical power on the horizontal axis and allocation to on the vertical axis. Higher and to the right is better. Figure 5 gives examples. In some scenarios, TrialMDP’s designs dominate the other adaptive designs, attaining higher power and assigning more patients to treatment . Figure 5(A) is one such case. In other scenarios, TrialMDP’s designs do not dominate the others. However, Figure 5(B) shows that even in those cases, TrialMDP still provides a useful way to control the balance between power and patient outcomes. For example, the user is free to choose a trial design with much better patient outcomes at the cost of slightly lower power, by selecting larger values of .
power and patient allocations; the error bars show symmetric 90% confidence intervals for the means. Note that there are error bars for the vertical direction, but they are too compact to be seen.
We demonstrate TrialMDP’s practical usage by applying it to a historical trial. We chose the phase-II thymoglobulin trial described by Bashir et al. (2012) because it (i) had two arms, (ii) had a small sample size (), and (iii) the trial designers saw fit to use an adaptive design, for ethical reasons. This combination made the trial well-suited for testing our algorithm.
We redesigned the trial in two phases: “parameter tuning” and “testing.” In the parameter tuning phase we swept through the same grid of values used in our simulation study, but with the sample size fixed at . We ran our algorithm and simulated 10,000 trials at each grid point, and generated frontier plots similar to those in Figure 5. Visual inspection suggested that TrialMDP with and would yield reasonable power and patient outcomes for a variety of scenarios.
In the testing phase we simulated the thymoglobulin trial by computing point estimates of and from the original trial’s results. We simulated two scenarios: a null scenario where , and an alternative where and . Using the design from TrialMDP with “tuned” parameter values and , we simulated 10,000 trials for each scenario. The results are aggregated in Table 3. Under the alternative scenario we found that TrialMDP’s design, on average, assigned significantly more patients to treatment with a slightly decreased power of 0.557. Note also that in the null scenario, TrialMDP’s design had a somewhat inflated type-I error of 0.055.
We presented TrialMDP, an algorithm for designing blocked RAR trials. TrialMDP represents a blocked RAR trial as a Markov Decision Process, and solves for the optimal design via dynamic programming. The resulting design dictates the size and treatment allocation of the next block, given the results observed thus far.
Our algorithm allows users to choose the relative importance of (i) statistical power and (ii) patient outcomes. The trial designs generated by TrialMDP consistently attain superior utility against a suite of baselines when (i) the effect size is large and (ii) patient outcomes are given sufficient importance. The simulation study in Section 3.1 demonstrates this.
TrialMDP has some shortcomings worth keeping in mind. It is currently restricted to a narrow class of trials: two-armed trials with binary outcomes. All outcomes for past blocks must be observed before the next block can begin. The MDP formulation assumes a single statistical test (one-sided CMH) is performed at the end of the trial. While interim analyses may be used in trials governed by the current version of TrialMDP, we provide no guarantees of optimality in that case. TrialMDP’s computational cost grows quickly with the number of patients, and becomes impractical for . Setting large values for the minimum block size and block increment parameters ( and ) can ameliorate some of this expense. Simulations showed that in some scenarios, TrialMDP’s designs have modestly inflated type-I error, and may yield a slightly biased estimate of effect size. These weaknesses should be weighed against the vastly superior patient outcomes TrialMDP can deliver.
The user of TrialMDP immediately faces a question: what values of and should be used? Consider the terms of Equation 1. Since is only a proxy for the statistical power, there isn’t a clear way to assign practical meaning to . For example, we cannot interpret as a literal “conversion rate” between units of failure and units of statistical power. This makes it difficult to set in a principled way. Instead we recommend tuning and through a process like the one demonstrated in Section 3.2: (i) use the algorithm to design trials for a grid of values; (ii) simulate trials for each design, for a set of scenarios ; (iii) examine the simulation results and choose that yield acceptable power and patient outcomes across scenarios. As a starting point, , yielded reasonable characteristics across all the scenarios in this paper.
Although TrialMDP’s current implementation is single-threaded, it is highly parallelizable and would have a speedup roughly linear in the number of threads. A multi-threaded parallel implementation is a natural next step.
There are multiple ways that TrialMDP could be extended to a broader class of trials. For instance, it could permit more than two arms and more than two outcomes. This would incur exponentially greater computational expense, but may be useful for some very small trials.
The current MDP formulation assumes that the trial terminates after all patients have been treated. A more sophisticated MDP could incorporate interim analyses, accounting for the possibility of early termination for success or futility.
We implemented TrialMDP in C++ and provide it as an R package on GitHub:
https://github.com/dpmerrell/TrialMDP. We also provide the code for our Section 3 evaluations in another repository: https://github.com/dpmerrell/TrialMDP-analyses. This includes a Snakemake workflow (Mölder et al., 2021) that reproduces all results in this paper.
Our Supplementary Material contains four appendices. Appendix A gives our justification for using the function . Appendix B shows that our optimization problem has the optimal substructure property (and hence TrialMDP yields an optimal policy with respect to our MDP assumptions). Appendix C derives the computational complexities given in Section 2.2. Appendix D shows that must be sufficiently large for an adaptive trial to attain higher utility than a single-block trial.
We thank Zhu Xiaojin and Blake Mason for conversations about bandit algorithms. DM was funded by the National Institutes of Health (award T32LM012413).
Conflict of Interest: None declared.
Equation 1 uses the following function, , as a proxy for a trial’s statistical power:
This appendix provides some justification for .
We’re interested in blocked RAR trials where the final analysis uses a Cochran-Mantel-Haenzsel (CMH) superiority test. Recall that the CMH statistic takes this form:
Under the null hypothesis, asymptotically. Intuitively, we maximize the power of the test by choosing such that when , the distribution of
has large mean without inflated variance. Our goal is to find an objective functionthat, when maximized, yields trial designs with those characteristics.
As a first candidate we may try maximizing the expected value of of :
where is the fraction of block ’s patients allocated to . The trial designer has no control over or . So if they wish to maximize this quantity then they may ignore the factor , yielding
as a proxy objective for maximizing power. It’s important to note, however, a subtle property of Expression 5. The denominator is minimized when more patients are allocated to the treatment with more extreme success probability—i.e., success probability closer to 0 or 1. As a result, the maximizer of Expression 5 exhibits a preference toward that treatment. This preference manifested itself in earlier versions of the algorithm, which would do well when , but would do worse when .
As a second candidate, we may try maximizing the the related quantity
Cochran uses Expression 6 as a proxy for the power of a CMH test in his original justifications for the CMH statistic (Cochran, 1954). Like Expression 5, the new Expression 8 also exhibits a preference based on extremality of the success probabilities. However, it instead favors the treatment with less extreme success probability, i.e., probability nearer . Versions of the algorithm based on Expression 8 would manifest this preference during simulations. The algorithm would attain superior utility when , but would do worse when .
Note the similarity between Expression 8 and Expression 5. They have identical numerators, and both denominators have the form where is some “combined variance” computed from . They differ precisely in how they compute . This in turn produces their different preferences (toward the treatment with less-extreme and more-extreme success probability, respectively). Neither of these preferences are favorable. We would like a proxy for power that has simpler dependence on and , which are unknown. To that end we propose our final candidate:
or, after squaring,
which is the expression for used in Section 2.1 (up to a factor of ).
We show that our optimization problem—maximizing expected utility—possesses the optimal substructure property. In other words, we prove that the recurrence relation (Equation 3) correctly decomposes the problem into subproblems, and reuses their solutions to solve the original problem.
Suppose the algorithm is evaluating for some state , and that it’s already evaluated for every possible successor state of . Let denote the optimal policy, i.e., the one yielding . Then optimal substructure follows from the linearity of our utility function. Assuming is not terminal:
which agrees exactly with the recurrence in Equation 3. A similar computation covers the case when is terminal.
Put another way, our dynamic program’s recurrence relation computes correctly at each state, and will yield the optimal policy .
We derive the space and time complexities given in Section 2.2.
Let denote the set of all contingency tables containing observations. Define , the set of integers ranging from to in increments of . Then , and the size of the full state space is
The algorithm stores data proportional to , so this gives the space complexity.
The time complexity results from a nested sum over states, actions, and transitions: