A web application for the design of multi-arm clinical trials

06/21/2019 ∙ by Michael J Grayling, et al. ∙ Newcastle University 0

Multi-arm designs provide an effective means of evaluating several treatments within the same clinical trial. Given the large number of treatments now available for testing in many disease areas, it has been argued that their utilisation should increase. However, for any given clinical trial there are numerous possible multi-arm designs that could be used, and choosing between them can be a difficult task. This task is complicated further by a lack of available easy-to-use software for designing multi-arm trials. To aid the wider implementation of multi-arm clinical trial designs, we have developed a web application for sample size calculation when using a variety of popular multiple comparison corrections. Furthermore, the application supports sample size calculation to control several varieties of power, as well as the determination of optimised arm-wise allocation ratios. It is built using the Shiny package in the R programming language, is free to access on any device with an internet browser, and requires no programming knowledge to use. The application provides the core information required by statisticians and clinicians to review the operating characteristics of a chosen multi-arm clinical trial design. We hope that it will assist with the future utilisation of such designs in practice.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Drug development is becoming an increasingly expensive process, with the estimated average cost per approved new compound now standing at over $1 bn

[1]. In no small part this is due to the high failure rate of clinical trials, in particular in phases II and III. This is particularly true in the field of oncology, where the likelihood of approval from phase I is only 5.1% [2]. Consequently, the clinical research community is constantly seeking new methods that may improve the efficiency of the drug development process.

One possible method, which has received substantial attention in recent years, is the idea to make use of multi-arm designs that compare several experimental treatments to a shared control group. Several desirable, inter-related, features of such designs have now been described. For example, the number of patients on the control treatment is typically reduced compared to conducting separate two-arm trials, and simultaneously patients are more likely to be randomized to an experimental treatment, which may help with recruitment [3, 4]. Furthermore, the overall required sample size, for the same level of power, will typically be smaller than that which would be required if multiple two-arm trials were conducted [5]. Finally, multi-arm designs offer a fair head-to-head comparison of experimental treatments in the same study [3, 4], and the cost of assessing a treatment in a multi-arm trial is often around half of that for a separate two-arm trial [3].

Based upon these advantages, and their experiences of utilising such designs in several oncology trials, Parmar et al. [3] make a compelling case for the need for more multi-arm designs to be used in clinical research. We are not aware of any systematic evidence on whether this has now permeated through to practice, but a simple search of PubMed Central suggests it may be the case: 859 articles have included the phrases “multi-arm” and “clinical trial” since 2015, as opposed to just 273 in all years prior to this. Considering this result in combination with the findings of Baron et al. [6], who determined 17.9% of trials published in 2009 were multi-arm, as well as the recent publication of a key guidance document on reporting results from multi-arm trials [7], it is clear that there is now much interest within the trials community in such designs.

However, whilst there are numerous advantages of multi-arm trials, it is important to recognise that determining a suitable design for a multi-arm clinical trial can be a substantially more complex process than for a two-arm trial. In particular, a decision must be made on how to account for the multiple comparisons that will be made. Indeed, whether the final analysis should adjust for multiplicity has been a topic of much debate within the literature. In brief, presented arguments primarily revolve around the fact that failing to account for multiplicity can substantially increase the probability of committing a type-I error. Yet, if a series of two-arm trials were conducted, no adjustment would be made to the significance level used in each trial. For brevity, we will not repeat all further arguments on this issue here, and instead refer the reader to several key discussions on multiplicity

[5, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18].

For the purposes of what follows in this article, the more important consideration is that when a multiple comparison correction (MCC) is to be used, one of a wide selection must actually be chosen (see, e.g., [19, 20, 21] for an overview). MCCs vary widely in their complexity, with Bonferroni’s correction often recommended because of its simplicity [7]. However, other MCCs often perform better in terms of the operating characteristics they impart, as Bonferroni’s correction is known to be conservative [10, 18, 20, 22]. A recent review found that amongst those multi-arm trials that did adjust for multiplicity, 50% used one of the comparatively simple Bonferroni or Dunnett corrections [5]. Thus, there arguably remains the potential for increased efficiency gains to be made in multi-arm trials, if more advanced MCCs can be employed.

Furthermore, regardless of whether a MCC is utilised, there are other complications that must also be addressed in multi-arm trial design, including how to power the trial, and what the allocation ratio to each experimental arm relative to the control arm will be. Indeed, power is not a simple quantity in a multi-arm trial, whilst the literature on how to choose the allocation ratios in an optimal manner is extensive (see, e.g., [23] for an overview), and deciding whether to specify allocation ratios absolutely, or whether they can be optimised to improve trial efficiency may not be an easy decision.

These considerations imply that user-friendly software for designing multi-arm clinical trials would be a valuable tool in the trials community. It is unfortunate therefore that little software is available to assist with such studies. The principal exception to this is the MULTIARM module for East [24], which allows users to compare the operating characteristics of many multi-arm designs with respect to numerous important quantities. However, the cost of this package may be prohibitive to many working within academia. For this reason, we have developed a web application for multi-arm clinical trial design. We hope that the availability of this application will assist with the utilization of more advanced multi-arm designs in future clinical trials.


The web application is written using the Shiny package [25] in the R programming language [26]. It is available as a function in (for off-line local use), and is built using other functions from, the R package multiarm [27]. A vignette is provided for multiarm that gives great detail on its formal statistical specifications. A less technical summary is provided here.

Design setting

It is assumed that outcomes will be accrued from patients on treatment arms , with arm corresponding to a shared control arm, and arms to several experimental arms. Later, we provide more information on the precise types of outcome that are currently supported by the web application. The hypotheses of interest are assumed to be for . Here, corresponds to a treatment effect for experimental arm

relative to the control arm. Thus, we assume one-sided tests for superiority. Note that in the app, reference is also made to the global null hypothesis,

, which we define to be the scenario with .

To test hypothesis

, we assume that a Wald test statistic,

, is computed

In what follows, we use the notation . With this, note that our app supports design in particular scenarios where , the random pre-trial value of , has (at least asymptotically) a -dimensional multivariate normal (MVN) distribution, with

As is discussed further later, this includes normally distributed outcome variable scenarios and, for large sample sizes, other parametric distributions such as Bernoulli outcome data.

Ultimately, to test the hypotheses,

is converted to a vector of

-values, , via , for . Here,

is the cumulative distribution function of an

-dimensional MVN distribution, with mean and covariance matrix . Precisely


is the probability density function of an

-dimensional MVN distribution with mean and covariance matrix , evaluated at vector .

Then, which null hypotheses are rejected is determined by comparing the to a set of significance thresholds specified based on a chosen MCC, in combination with a nominated significance level . Before we describe the currently supported MCCs however, we will first describe the operating characteristics that are currently evaluated by the app.

Operating characteristics

Our app returns a wide selection of statistical operating characteristics that may be of interest when choosing a multi-arm trial design. Specifically, it can compute the following quantities for any nominated multi-arm design and true set of treatment effects

  • The conjunctive power (): The probability that all of the null hypotheses are rejected, irrespective of whether they are true or false.

  • The disjunctive power (): The probability that at least one of the null hypotheses is rejected, irrespective of whether they are true or false.

  • The marginal power for arm (): The probability that is rejected, irrespective of whether it is true or false.

  • The per-hypothesis error-rate (): The expected value of the number of type-I errors divided by the number of hypotheses.

  • The -generalised type-I familywise error-rate (): The probability that at least type-I errors are made. Note that is the conventional familywise error-rate (); the probability of making at least one type-I error.

  • The -generalised type-II familywise error-rate (): The probability that at least type-II errors are made.

  • The false discovery rate (): The expected proportion of type-I errors amongst the rejected hypotheses.

  • The false non-discovery rate (): The expected proportion of type-II errors amongst the hypotheses that are not rejected.

  • The positive false discovery rate (): The rate that rejections are type-I errors.

  • The sensitivity (): The expected proportion of the number of correct rejections of the hypotheses to the number of false null hypotheses.

  • The specificity (): The expected proportion of the number of correctly not rejected hypotheses to the number of true null hypotheses.

Multiple comparison corrections

Per-hypothesis error-rate control

The most simple method for selecting the significance thresholds against which to compare the , is to compare each to the chosen significance level . That is, to reject for if . This controls the to .

A potential problem with this, however, can be that the statistical operating characteristics of the resulting design may not be desirable (e.g., in terms of ). As discussed earlier, it is for this reason that we may wish to make use of a MCC. Currently, the web application supports the use of a variety of such MCCs, which aim to control either (a) the conventional familywise error-rate, (with these techniques sub-divided into single-step, step-down, and step-up corrections) or (b) the .

Single-step familywise error-rate control

These MCCs test each of the against a common significance level, say, rejecting if . The currently supported single-step corrections are

  • Bonferroni’s correction: This sets [28].

  • Sidak’s correction: This sets [29].

  • Dunnett’s correction: This sets , where is the solution of the following equation

    with an -dimensional vector of zeroes [30].

Note that each of the above specify a such that the maximum probability of incorrectly rejecting at least one of the null hypotheses , , over all possible values of is at most . This is referred to as strong control of .

Step-down familywise error-rate control

Step-down MCCs work by ranking the -values from smallest to largest. We will refer to these ranked -values by , with associated hypotheses . The are compared to a vector of significance levels . Precisely, the maximal index such that is identified, and then are rejected and are not rejected. If then we do not reject any of the null hypotheses, and if no such exists then we reject all of the null hypotheses. The currently supported step-down corrections are

  • Holm-Bonferroni correction: This sets [31].

  • Holm-Sidak correction: This sets .

  • Step-down Dunnett correction: This can only currently be used when the are equal for all . In this case, it sets , where is the solution to

Note that both of the above methods provide strong control of .

Step-up familywise error-rate control

Step-up MCCs also work by ranking the -values from smallest to largest, and similarly utilise a vector of significance levels . However, here, the largest such that is identified. Then, the hypotheses are rejected, and are not rejected. Currently, one such correction is supported: Hochberg’s correction [32], which sets . This method also provides strong control of .

False discovery rate control

It may be of interest to instead control the , which can offer a compromise between strict control and control, especially when we expect a large proportion of the experimental treatments to be effective. Currently, two methods that will control the to at most over all possible are supported. They function in the same way as the step-up corrections discussed above, with

  • Benjamini-Hochberg correction: This sets [33].

  • Benjamini-Yekutieli correction: This sets [34]:

Sample size determination

The sample size required by a design to control several types of power to a specified level , under certain specific scenarios, can be computed. Precisely, following for example [35], values for ‘interesting’ and ‘uninteresting’ treatment effects, and respectively, are specified and the following definitions are made

  • The global alternative hypothesis, , is given by .

  • The least favourable configuration for experimental arm , , is given by .

Then, the following types of power can be controlled to level by design’s determined using the app

  • The conjunctive power under .

  • The disjunctive power under .

  • The minimum marginal power under the respective .

Allocation ratios

One of the primary goals of the app is to aid the choice of values for . The app specifically supports the determination of values for these parameters by searching for a suitable via a one-dimensional root solving algorithm, and then sets , , for . Here, is the allocation ratio for experimental arm relative to the control arm.

For this reason, the app also allows the allocation ratios to be specified in a variety of ways: they can be defined explicitly, or alternatively can be determined in an optimal manner. For this optimality problem, many possible optimality criteria have been defined, each with their own merits. Therefore, we refer the reader to Atkinson (2007) [23] for further details of optimal allocation in multi-arm designs. Instead, we simply note that in the web application, the allocation ratios can currently be determined for three such criteria

  • -optimality: Minimizes the trace of the inverse of the information matrix of the design. This results in the minimization of the average variance of the treatment effect estimates.

  • -optimality: Maximizes the determinant of the information matrix of the design. This results in the minimization of the volume of the confidence ellipsoid for the treatment effect estimates.

  • -optimality: Maximizes the minimum eigenvalue of the information matrix. This results in the minimization of the maximum variance of the treatment effect estimates.

The optimal allocation ratios are identified in the app using available closed-form solutions were possible (see [36]

for a summary of these), otherwise non-linear programming is employed.

Other design specifications

Finally, the web application also supports the following options

  • Plot production: Plots can be produced of (a) all of the operating characteristics quantities listed earlier when , as well as (b) the when and for . If these are selected for rendering, the quality of the plots, in terms of the number of values of used for line-graph production, can also be controlled.

  • Require for : By default, the sample size determined for each arm will only be required to be a positive number. In practice, such values need to be integers. This can thus be enforced if desired, with the integer specified by rounding up their determined continuous values.

Supported outcome variables

Normally distributed outcome variables

Currently, the app supports multi-arm trial design for scenarios in which the outcome variables are assumed to be either normally or Bernoulli distributed.

Precisely, for the normal case, it assumes that , and that is known for . Then, for each

where is the realised value of .

Note that in this case, has a MVN distribution, and thus the operating characteristics can be computed exactly and efficiently using MVN integration [37]. Furthermore, the distribution of does not depend upon the values of the , . Consequently, these parameters play no part in the inputs or outputs of the app.

Bernoulli distributed outcome variables

In this case, for response rates , and for each

Thus, a problem for design determination becomes that the are dependent on the unknown response rates. In practice, this is handled at the analysis stage of a trial by setting

for , . This is the assumption made where required in by the app. With this, is only asymptotically MVN. Thus, in general it would be important to validate operating characteristics evaluated using MVN integration via simulation.

In addition, note that the above problem also means that the operating characterstics under , , and the are not unique without further restriction. Thus, to achieve uniqueness, the app requires a value be specified for for use in the definition of these scenarios. Moreover, for this reason, the inputs and outputs of functions supporting Bernoulli outcomes make no reference to the , and work instead directly in terms of the . Finally, note that this problem also means that to determine -, -, or -optimised allocation ratios, a specific set of values for the must be assumed.

In this case, we should also ensure that and , for the assumed value of , since for .



The web application is freely available from https://mjgrayling.shinyapps.io/multiarm/. The R code for the application can also be downloaded from https://github.com/mjg211/multiarm. Furthermore, as noted earlier, the app is built in to the package multiarm [27], as the function gui(), for ease-of-use without internet access. The application has a simple interface, and has the capability to

  • Determine the sample required in each arm in a specified multi-arm clinical trial design scenario;

  • Summarise and plot the operating characteristics of the identified design;

  • Produce a report describing the chosen design scenario, the identified design, and a summary of its operating characteristics.


The outputs (i.e., the identified design and its operating characteristics) are determined based upon the following set of user specified inputs (Figure 1)

  1. The number of experimental treatment arms, .

  2. The chosen multiple comparison correction (e.g., Dunnett’s correction).

  3. The significance level, .

  4. The type of power to control (e.g., the conjunctive power under ).

  5. The desired power, .

  6. For Bernoulli distributed data, the control arm response rate .

  7. The interesting treatment effect, .

  8. The uninteresting treatment effect, .

  9. For normally distributed data, the standard deviations,

    . These are allocated by first selecting the type of standard deviations (e.g., that they are assumed to be equal across all arms), and then the actual values for the parameters.

  10. The allocation ratios (e.g., -optimal).

  11. For Bernoulli distributed data, when searching for optimal allocation ratios, the response rates to assume in the search.

  12. Whether the sample size in each arm should be required to be an integer;

  13. Whether plots should be produced, and if so the plot quality.

Note that a Reset inputs button is provided to simplify returning the inputs to their default values. Once the inputs have been specified as desired, the outputs can be generated by clicking the Update outputs button.


Here, we demonstrate specification of the input parameters (Figure 1), and then subsequent output generation (Figures 2-4), for parameters motivated by a three-arm phase II randomized controlled trial of treatments for myelodysplastic syndrome patients, described in [38]. This trial compared, via a binary primary outcome, two experimental treatments with conventional azacitidine treatment. The trial was designed with , , , and . For simplicity, we assume that the familiar Dunnett correction will be used, that , and that allocation will be equal across the arms (). Finally, we assume it is the minimum marginal power that should be controlled.

Each input widget in Figure 1 can be seen to have been allocated accordingly based on the description above, whilst we have additionally elected to produce plots (of medium quality), and to not require the arm-wise sample sizes to be integers. Note that in Figure 1 we can see that the input widgets are supported by help boxes that can be opened by clicking on the small question marks beside them.

Figure 2 then depicts the output to the Design summary box once the user clicks on Update outputs. Specifically, a summary of the chosen inputs and the identified design is rendered. Furthermore, in Figure 3 we can see the tables that provide the various statistical quantities under , , the , as well as the various treatment effect scenarios that are considered for plot production.

Finally, in Figure 4 the plots discussed earlier are shown. Observe that horizontal and vertical lines are added at the values , , , and respectively. Note that these plots are outputted in a manner to allow the user to zoom in on a particular sub-component if desired.

In all, Figures 2-4 provide a set of outputs with a variety of features that should be anticipated given the chosen input parameters. Firstly, the specification that the allocation to all arms should be equal means that . In addition, is equal to 0.15 under , and the minimum marginal power is 0.799, as is approximately desired. Moreover, the specification that means that and are equal for each of the , and .

Finally, as noted above, and as can be seen in Figure 1, a Generate report button is provided that can produce a copy of the outputs in either PDF (.pdf), HTML (.html), or Word (.docx) format. The user can also nominate a name for this file in the Report filename input widget. This allows a record of designs to be stored, presented, and compared to other designs if required.


A possible barrier to previous calls for increased use of multi-arm clinical trial designs is a lack of available easy-to-access user-friendly software that facilitates associated sample size calculations. For this reason, we have created an online web application that supports multi-arm trial design determination for a wide selection of possible input parameters. Its use requires no knowledge of statistical programming languages and is facilitated via a simple user interface. Furthermore, we have made the application available on the internet, so that it is readily accessible, and have also made it freely available for download for remote use without an internet connection. Like similar applications that have been released recently for phase I clinical trial design [39, 40], we hope that the availability of this application will assist with the design of future multi-arm studies.

Before we conclude, however, it is important to acknowledge the limitations of our work. Firstly, MVN integration is utilised in all instances to determine the statistical operating characteristics of potential multi-arm designs. This makes the execution time for returning outputs with many possible input parameters fast. However, there is an unavoidable complexity in certain multi-arm designs, which may make execution time long. This is particularly true of scenarios with . It can also be true of designs that utilise the more complex step-wise MCCs. It is for this reason that the application places an upper cap in the inputs of , and also returns a warning in scenarios for which a lengthy execution time would be anticipated. Nonetheless, users may have to wait several minutes in certain situations to identify their desired design.

Furthermore, it is crucial that all software for clinical trial design be validated. This is challenging for multi-arm designs because of the aforementioned limited freely available software for designing such studies. We compared the output of our application to that of PASS [41], a validated software package, for a variety of supported input parameters, but output for many possible inputs remained difficult to corroborate because of a lack of equivalent available functionality. For this reason, we have carefully followed recommended good-programming practices and perform all statistical calculations within the application by calling functions from the R package multiarm, in which the code has been modularised [27]. Furthermore, in this package we have created a function that simulates multi-arm clinical trials using a given design. This allows us to perform an additional check on our analytical computations. Specifically, we generated 1000 random combinations of possible input parameters for trials assuming normally distributed outcomes, thus covering an extremely wide range of supported design scenarios. The analytical operating characteristics returned by the web application in the Operating characteristics summary boxes for , , and the were then compared to those based on trial simulation, using 100,000 replicate simulations in each of the 1000 designs. Across all considered scenarios, the maximum absolute difference between the analytical and simulated operating characteristics was just , which is within what would be anticipated due to simulation error. Consequently, it does appear that our command is functioning as desired. Code to replicate this work is available upon request from the corresponding author.

Finally, we note one primary possible avenue for future development of the web application: numerous papers have now provided designs for adaptive multi-arm trials (e.g., [42, 43]), and software for their determination in certain settings [44, 45]. Given the evidential increased interest in such designs [46], allowing for their determination would be a valuable extension to our application.


This work was supported by the Medical Research Council [grant number MC_UU_00002/6 to JMSW]. The funding body did not have any role in the design of this study, collection, analysis, and interpretation of data, nor in the writing of the manuscript.


  • [1] DiMasi, J.A., Grabowski, H.G., Hansen, R.W.: Innovation in the pharmaceutical industry: new estimates of R&D costs. Journal of Health Economics 47, 20–33 (2016)
  • [2] Biotechnology Innovation Organization (BIO), Biomedtracker, AMPLION: Clinical development success rates 2006-2015 (2016)
  • [3] Parmar, M.K.B., Carpenter, J., Sydes, M.R.: More multiarm randomised trials of superiority are needed. Lancet 384(9940), 283–4 (2014)
  • [4] Jaki, T., Wason, J.M.S.: Multi-arm multi-stage trials can improve the efficiency of finding effective treatments for stroke: a case study. BMC Cardiovascular Disorders 18(1), 215 (2018)
  • [5] Wason, J.M.S., Stecher, L., Mander, A.P.: Correcting for multiple-testing in multi-arm trials: is it necessary and is it done? Trials 15, 364 (2014)
  • [6] Baron, G., Perrodeau, E., Boutron, I., Ravaud, P.: Reporting of analyses from randomized controlled trials with multiple arms: a systematic review. BMC Medicine 11, 84 (2013)
  • [7] Juszczak, E., Altman, D.G., Hopewell, S., Schulz, K.: Reporting of multi-arm parallel-group randomized trials: extension of the CONSORT 2010 statement. JAMA 321(16), 1610–1620 (2019)
  • [8] Rothman, K.J.: No adjustments are needed for multiple comparisons. Epidemiology 1(1), 43–46 (1990)
  • [9] Cook, R.J., Farewell, V.T.: Multiplicity considerations in the design and analysis of clinical trials. Journal of the Royal Statistical Society (Series A) 159(1), 93–110 (1996)
  • [10] Proschan, M.A., Waclawiw, M.A.: Practical guidelines for multiplicity adjustment in clinical trials. Controlled clinical trials 21(6), 527–539 (2000)
  • [11] Bender, R., Lange, S.: Adjusting for multiple testing - when and how? Journal of Clinical Epidemiology 54(4) (2001)
  • [12] Feise, R.J.: Do multiple outcome measures require p-value adjustment? BMC Medical Research Methodology 2, 8 (2002)
  • [13] Hughes, M.D.: Multiplicity in clinical trials. Encyclopedia Biostatistics 5, 3446–3451 (2005)
  • [14] Freidlin, B., Korn, E.L., Gray, R., Martin, A.: Multi-arm clinical trials of new agents: some design considerations. Clinical Cancer Research 14 (2008)
  • [15] Li, G., Taljaard, M., Van den Heuvel, E.R., Levine, M.A.H., Cook, D.J., Wells, G.A., Devereaux, P.J., Thabane, L.: An introduction to multiplicity issues in clinical trials: the what, why, when and how. International Journal of Epidemiology 46(2), 746–755 (2016)
  • [16] Agency, E.M.: Guideline on Multiplicity Issues in Clinical Trials. (2017). https://www.ema.europa.eu/en/documents/scientific-guideline/draft-guideline-multiplicity-issues-clinical-trials_en.pdf
  • [17] Administration, U.F..D.: Multiple Endpoints in Clinical Trials Guidance for Industry. (2017). https://www.fda.gov/regulatory-information/search-fda-guidance-documents/multiple-endpoints-clinical-trials-guidance-industry
  • [18] Howard, D.R., Brown, J.M., Todd, S., Gregory, W.M.: Recommendations on multiple testing adjustment in multi-arm trials with a shared control group. Statistical Methods in Medical Research 27(5), 1513–1530 (2018)
  • [19] Hochberg, Y., Tamhane, A.C.: Multiple Comparison Procedures. John Wiley & Sons, New York, NY (1987)
  • [20] Hsu, J.C.: Multiple Comparisons. Chapman & Hall, London (1996)
  • [21] Bretz, F., Hothorn, T., Westfall, P.: Multiple Comparisons using R. CRC Press, Boca Raton, FL (2010)
  • [22] Sankoh, A.J., D’Agostino, R.B.S., Huque, M.F.: Efficacy endpoint selection and multiplicity adjustment methods in clinical trials with inherent multiple endpoint issues. Statistics in Medicine 22(20), 3133–3150 (2003)
  • [23] Atkinson, A., Donev, A., Tobias, R.: Optimum Experimental Designs, with SAS. Oxford University Press, Oxford (2007)
  • [24] East. https://www.cytel.com/software/east. Accessed: 2019-05-04
  • [25] Chang, W., Cheng, J., Allaire, J.J., Xie, Y., McPherson, J.: shiny: Web Application Framework for R. (2019). https://CRAN.R-project.org/package=shiny
  • [26] R Core Team: R: a Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2018). R Foundation for Statistical Computing. https://www.R-project.org/
  • [27] Grayling, M.J.: multiarm: Design and analysis of fixed-sample multi-arm clinical trials (2019). http://www.github.com/mjg211/multiarm/
  • [28] Bonferroni, C.E.: Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze (1936)
  • [29] Šidák, Z.: Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association 62(318), 626–633 (1967)
  • [30] Dunnett, C.W.: A multiple comparison procedure for comparing several treatments with a control. Journal of the American Statistical Association 50(272), 1096–1121 (1955)
  • [31] Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6(2), 65–70 (1979)
  • [32] Hochberg, Y.: A sharper bonferroni procedure for multiple tests of significance. Biometrika 75(4), 800–802 (1988)
  • [33] Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society (Series B) 57(1), 289–300 (1995)
  • [34] Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29(4), 1165–1188 (1995)
  • [35] Wason, J., Magirr, D., Law, M., Jaki, T.: Some recommendations for multi-arm multi-stage trials. Statistical Methods in Medical Research 25(2), 716–727 (2016)
  • [36] Sverdlov, O., Rosenberger, W.F.: On recent advances in optimal allocation designs in clinical trials. Journal of Statistical Theory and Practice 7(4), 753–773 (2013)
  • [37] Genz, A., Bretz, F., Miwa, T., X, M., F, L., F, S., T, H.: mvtnorm: Multivariate normal and t distributions. R package version 1.0-10. (2019). http://CRAN.R-project.org/package=mvtnorm
  • [38] Jacob, L., M, U., Boulet, S., Begaj, I., Chevret, S.: Evaluation of a multi-arm multi-stage Bayesian design for phase II drug selection trials - an example in hemato-oncology. BMC Medical Research Methodology 16, 67 (2016)
  • [39] Wheeler, G.M., Sweeting, M.J., Mander, A.P.: AplusB: A Web Application for Investigating A + B Designs for Phase I Cancer Clinical Trials. PLoS ONE 11(7), 0159026 (2016)
  • [40] Wages, N.A., Petroni, G.R.: A web tool for designing and conducting phase I trials using the continual reassessment method. BMC Cancer 18, 133 (2018)
  • [41] PASS. https://www.ncss.com/software/pass/. Accessed: 2019-05-04
  • [42] Magirr, D., Jaki, T., Whitehead, J.: A generalized Dunnett test for multi-arm multi-stage clinical studies with treatment selection. Biometrika 99(2), 494–501 (2012)
  • [43] Wason, J., Stallard, N., Bowden, J., Jennison, C.: A multi-stage drop-the-losers design for multi-arm clinical trials. Statistical Methods in Medical Research 26(1), 508–524 (2017)
  • [44] Barthel, F.M.S., Royston, P., Parmar, M.K.B.: A menu-driven facility for sample-size calculation in novel multiarm, multistage randomized controlled trials with a time-to-event outcome. Stata Journal 9(4), 505–523 (2009)
  • [45] Jaki, T., Pallmann, P., Magirr, D.: The R package MAMS for designing multi-arm multi-stage clinical trials. Journal of Statistical Software 88(4), 1–25 (2019)
  • [46] Dimairo, M., Coates, E., Pallmann, P., Todd, S., Julious, S.A., Jaki, T., Wason, J., Mander, A.P., Weir, C.J., Koenig, F., Walton, M.K., Biggs, K., Nicholl, J., Hamasaki, T., Proschan, M.A., Scott, J.A., Ando, Y., Hind, D., Altman, D.G.: Development process of a consensus-driven CONSORT extension for randomised trials using an adaptive design. BMC Medicine 16, 210 (2018)