Many common tests for continuous outcomes are based on the t test statistics. Examples include the one sample t test, two sample t test, and tests associated with the analysis of covariance (ANCOVA) and linear mixed effects models for repeated measurement (MMRM). The sample size determination is critical to ensure the success of a clinical trial since an underpowered study has less chance to detect an important treatment effect, whereas the samples that are too large may waste time and resources. Sample size calculation for the t tests is usually based on the normal approximation, and/or the asymptotic variance of the treatment effect [1, 2, 3, 4]. These methods work well in large clinical trials, but generally underestimate the size in small trials because the normal distribution cannot adequately approximate the t distribution, and the asymptotic variance underestimates the true variance of the estimated effect in ANCOVA and MMRM .
In this article, we propose a noniterative sample size procedure for a test based on the t distribution in finite samples. The procedure generalizes Guenther’s  method for the one sample t test and two sample t tests with equal variances, which is extended to the two sample t test with unequal variances by Schouten . In Guenther’s approach, the normal approximation is improved by adding a correction factor. As indicated by Schouten , Guenther’s approach still underestimates the required sample size. We also propose a slightly more conservative sample size estimate by introducing one lower order correction term to Guenther’s formula. For ANCOVA and MMRM, additional correction terms are added to account for lower order variance terms, which are functions of covariates included in the regression. There is limited information about the covariate distribution at the design stage due to the inclusion/ exclusion criteria imposed on the patients. But there is no need to specify the covariate distribution.
The proposed sample size method is suitable for superiority trials, noninferiority (NI) trials and a special case of the trials for demonstrating clinical equivalence or bioequivalence (BE). In Section 2, we present the noniterative sample size procedure for a number of t tests commonly used in the analysis of superiority trials, and assess their performance by simulation. We derive accurate power formulae for ANCOVA and MMRM, and the formula for ANCOVA is exact if the covariates are normally distributed. Section 3 studies the power and sample size determination for the NI, equivalence and BE trials, where we also obtain the exact power for the two sample t test with unequal variance in equivalence trials. Numerical examples indicate that the sample size estimate (after rounding to an integer) from the noniterative procedure is often exact and identical to that obtained by numerically inverting the power equation.
Throughout the paper, we let denote the t distribution with degrees of freedom (d.f.) and noncentrality parameter , the central t distribution,
the F distribution withand d.f. and noncentrality parameter , and the central F distribution. Let and be respectively the th percentiles of the normal and central distributions. Let
be the cumulative distribution function of. Let .
2 A generalized sample size procedure for t tests in superiority trials
2.1 The generalized sample size procedure
Let be the parameter of interest. For example, is the difference in the mean response between two treatment groups in comparative clinical trials. Let be the point estimate of , the associated variance, and the estimate of the variance parameter . Assume that and are independent, and . Then and . Suppose we are interested in the test of equality
In comparative superiority trials, the purpose is to show that the test treatment is better than the control, and is usually set to . The test statistic under
. The null hypothesisis rejected if .
Since under , the power of the two-sided test (1) is
which can be well approximated by the power of the one-sided test if is not too close to to be of practical interest
The sample size is often obtained by numerically inverting Equation (2) or by normal approximation. The normal approximation is poor if the resulting sample size is small
where . If is a random quantity, it will be replaced by its expected value evaluated at . Guenther  obtained formula (5) for the one sample t test and two sample t test with equal variance (). The two sample t test with unequal variances was studied by Schouten . Schouten  indicated that formula (5) tends to underestimate the required size for these simple t tests. For this reason, we also propose the following slightly more conservative estimate,
2.2 Sample size for some commonly used t tests
We illustrate how to use the generalized procedure in Section 2.1 to calculate the power and sample size for the one sample t test, two sample t tests with or without equal variances, ANCOVA and MMRM. These tests are commonly used in the analysis of randomized clinical trials.
2.2.1 One sample t test
Suppose for . Let and . The test statistic can be written as
The methods in Section 2 can be applied by setting , , and . Note that Guenther  obtained the noniterative sample size formula (5), and that formula (2) yields the exact power for the one-sample t test.
The methods for the one sample t test can be adapted for crossover trials without a period effect by setting as the difference in two treatment means, and , where is the response for subject in period , and . Please refer to Section 3.3 for details.
2.2.2 Two sample t test with equal variances
Suppose for , . Let be the total size, and the proportion of subjects in group . Let , , and . The test statistic is
2.2.3 Two sample t test with unequal variances
Suppose . Let , , and , . The t statistic is
The d.f. of the t test is computed using the Satterthwaite approximation
The unknown and are replaced respectively by and in the data analysis.
2.2.4 Analysis of covariance (ANCOVA)
Suppose in a clinical trial, subjects are randomized to treatment group ( for experimental, and for placebo). The total sample size is . Let be the response, and the vector of covariates (excluding the treatment status and intercept) associated with subject in group . Let and . The data can be analyzed by the ANCOVA
where is the intercept, is the treatment effect, is the covariate effect, and is the residual variance in that is unexplained by the covariates and treatment.
The least square estimate of the treatment effect and its variance are given by
where , , , , , , , and . Let and . In ANCOVA, the inference is made by assuming ’s are known and fixed. Given ’s, the test statistic for is distributed as
At the design stage, ’s are typically unknown. The power is given by
is the probability density function (PDF) of, and . We assume . The assumption holds exactly, and Equation (10) yields the exact power if is normally distributed . For nonnormal covariates, the power estimation based on the approximation generally leads to very accurate power estimate in randomized trials (i.e. no systematic difference in the distribution of between two groups), and this will be demonstrated in Section . To avoid numerical integration, we approximate Equation (10) by replacing by
In large trials, the sample size is commonly estimated based on the normal approximation and the asymptotic variance
Another common approach is to invert the power formula below based on the t distribution and asymptotic variance ,
The sample size based on the normal approximation and the exact variance is
Plugging into Equations (5) and (6) yields the size based on the t distribution (). We use the approximation (15) instead of the explicit solution to Equation (14) to slightly simplify the calculation. It also enables the generalization of the method to MMRM that will be investigated in Section 2.2.5.
In the two step approach, Equation (7) is calculated as
2.2.5 Mixed effects model for repeated measures (MMRM)
Suppose in a clinical trial, subjects are randomly assigned to the experimental () or control () treatment. Let and be the number and proportion of subjects randomized to group . Let be the outcomes collected at post-baseline visits, and the vector of covariates for subject in group . Let . In clinical trials, the data are missing mainly due to dropout . At the design stage, it is reasonable to assume the missing data pattern is monotone in the sense that if is observed, then ’s are observed for all . Let and be the number and proportion of subjects retained at visit in group . The total number of subjects retained at visit is , and the pooled retention rate at visit is . Without loss of generality, we sort the data so that within each group, subjects who stay in the trial longer will have smaller index than subjects who discontinue earlier.
where is an unstructured (UN) covariance matrix. A structured covariance matrix (possibly induced via the use of random effects) can be useful when individuals have a large number of observations, or varying time points of observations . In MMRM, inference is often made based on the restricted maximum likelihood (REML) and Kenward-Roger  adjusted variance estimate to reduce the small sample bias .
where , , , , and .
The treatment effect estimate at visit is , and its Kenword-Roger variance estimate is
where , , , and .
We use slightly different notation in MMRM. We let denote the treatment effect at first timepoint. The true value for under is , and its value under is . The test statistic for vs ,
approximately follows a distribution under , and the d.f. is obtained from the Satterthwaite approximation 
Lu et al [2, 3] developed power and sample size methods for MMRM. These methods are based on the asymptotic variance of instead of the commonly used Kenword-Roger adjusted variance estimate. The Kenword-Roger variance estimate 
provides a roughly unbiased estimate of the varianceof while ignoring the lower order term
where is the -th element of .
In the MMRM analysis, ’s are assumed to be fixed, but unknown at the design stage. In the power calculation, we will replace ’s, and by their expected values
where and . It is possible to derive a better approximation of the d.f. . We will not pursue it further here.
The power of the Wald test at a two-sided significance level of is given by
One may approximate by , and/or by to simplify the calculation, where can be interpreted as the fraction of observed information among subjects retained at visit . The following approximation of Tang  is only slightly less accurate than Equation (22) even in small samples