obtained the exact power formulae for some commonly used t tests in superiority, noninferiority (NI) and equivalence trials. The power determination for the analysis of covariance (ANCOVA) and t-test with unequal variances in equivalence trials involves two-dimensional numerical integration. We show that the calculation can be simplified by using Owen’s Q function, which is available in standard statistical software packages (e.g. SAS and RPowerTOST ). We extend the method for ANCOVA to unstratified and stratified multi-arm randomized trials, and apply it to the power determination for multi-arm trials and gold standard NI trials (Pigeot et al., 2003).
the cumulative distribution function (CDF) of, the CDF of a central distribution, and Owen’s Q function. Let be the number of subjects in group , the total size, the superiority () or NI margin, and the lower and upper equivalence margins. Without loss of generality, we assume high scores indicate better health.
2 Two sample t tests
be the estimated effect and variance with true valuesin a test based on the t distribution. Suppose is independent of
. In superiority and NI trials, we reject the null hypothesis when. If and are known, the exact power is , or minus the CDF of evaluated at .
An equivalence test is significant if both and . By the change of variable , the exact power equation (26) of Tang (2018b) can be rearranged in terms of Owen’s Q function as
where is the CDF of , , and .
where , is the sample variance in group , , , , and
The exact equivalence power (equation (A3) of Tang (2018b)) can be reexpressed as
where , and . Please see Tang (2018b) for numerical examples.
Tang (2018a, b) derived the exact power formulae for ANCOVA analysis of two-arm trials. Below we present more general results for unstratified or stratified multi-arm randomized trials. Suppose subjects are randomized to treatment groups () within each of strata. In an unstratified trial, we set . Subjects in treatment group are modeled by
where () is the indicator variable for the pre-stratification factors, is the effect for treatment group , is the vector of baseline covariates, , and . In general, equals the number of strata . In trials with multiple stratification factors, if there is no interaction between some stratification factors. By the same arguments as the proof of equation (15) in Tang (2018a), we obtain the variance for the linear contrast with coefficients
where , is the mean of in group , , is a function of the covariate ’s, and . In a two arm trial (Tang, 2018a), if there is no restriction on the stratum effect (i.e. ), where is the number of subjects in stratum , treatment group . A constant treatment allocation ratio is commonly used in practice. Then and . Let , , and . When
’s are normally distributed,and the exact power for the superior or NI test is
Below we give three hypothetical examples. Sample R code is provided in the Supplementary Material. In each example, the simulated (SIM) power is evaluated based on simulated datasets. There is more than chance that the SIM power lies within of the true power. In example , we perform the power calculation for a superiority trial. Subjects are randomized equally into groups ( experimental, or control treatment) stratified by gender ( for male, for female) and age ( if old, otherwise). There are subjects per treatment group per stratum (, ). There is no interaction between age and gender (, ), and the outcome is normally distributed as
where and . We compare each experimental treatment versus control treatment at the Bonferroni-adjusted one tailed significance level of . The exact power by formula (4) is and , and the SIM power is and respectively for the two tests.
Example has similar setup to example except that and the sample size is per group per stratum (, ). The aim is to establish the equivalence of each experimental treatment versus control treatment at . The margin is . The exact power by formula (5) is and respectively for the two tests, while the SIM power is and .
In example , we design a three-arm “gold standard” NI trial (Pigeot et al., 2003). It consists of placebo (), an active control treatment () and an experimental treatment (). The set up is similar to example except that , and the sample size is per group per stratum (, ). Two tests are conducted at the one-sided significance level of . Test evaluates the superiority of treatment over placebo. The power for this test (exact , SIM ) is very close to . In test , we assess the noninferiority of treatment to treatment by demonstrating that treatment preserves at least of the efficacy of treatment compared to placebo (i.e. or ). The exact power of test is (SIM power ). The noninferiority is claimed only if both tests are significant (Pigeot et al., 2003), and the overall power is at least while the simulated power is .
- Moser et al. (1989) Moser, B. K., G. R. Stevens, and C. L. Watts (1989). The two-sample t test versus Satterthwaite’s approximate F test. Communications in Statistics – Theory and Methods 18, 3963 – 75.
- Pigeot et al. (2003) Pigeot, I., J. Schafer, J. Rohmel, and D. Hauschke (2003). Assessing non-inferiority of a new treatment in a three-arm clinical trial including a placebo. Statistics in Medicine 22, 883 – 99.
- Tang (2018a) Tang, Y. (2018a). Exact and approximate power and sample size calculations for analysis of covariance in randomized clinical trials with or without stratification. Statistics in Biopharmaceutical Research 10, 274 – 286.
- Tang (2018b) Tang, Y. (2018b). A noniterative sample size procedure for tests based on t distributions. Statistics in Medicine 37, 3197 – 213.