As recently shown  time-to-event endpoints, such as overall survival (OS) and progression-free survival (PFS), experience increased popularity in Phase II oncology trials. This may be due to the changes in the clinical development of oncology treatments, with new treatment types relying on different mechanisms coming up. Additionally, a majority of these Phase II trials is single-armed . However, the traditional testing method for single-armed survival trials, which is the one-sample log-rank test, is known to be quite conservative 
. This is the case especially for small sample sizes. There have been some efforts to solve this problem. Unfortunately, some of these methods are more complicated as they require estimations of higher moments and are not suitable for sample size calculation as the distribution of the test statistic under alternative hypotheses is unknown[3, 4] while others lack a clear theoretical motivation and can be anti-conservative in some cases [5, 6]. Nevertheless, the latter approach sheds a light on the previously neglected possibility to include the counting process into the variance estimation.
The conservativeness of the classical approach resp. the anti-conservativeness of the new approach 
in many scenarios is due to the skewedness of the underlying test statistic which is the ratio of the compensated counting process of observed events and the square root of its variance estimator. The skewedness of its distribution under the null hypothesis in many scenarios can be attributed to two different causes:
The skewedness of the numerator itself.
The dependence between the numerator and its variance estimator.
While the first problem is difficult to handle, we will focus on the second one. Although Basu’s theorem 
guarantees independence of mean and variance estimator for normally distributed data, this does in general not apply in our case. However, the theoretical result on which the asymptotical correctness is based
leaves us with one degree of freedom concerning the choice of the variance estimator. We develop a framework in which this property is exhausted. The classical one-sample log-rank test and already existing improvements[5, 6] can be embedded as special cases into it. Furthermore, we extend existing methodology  to enable sample size and power calculations for any approach fitting into this framework. Building on this, we can find a variance estimator which is uncorrelated to the compensated counting process under the null hypothesis of the testing problem.
Especially recently developed methodology concerning adaptive , multi-stage [10, 11, 12] and multivariate extensions 
of the one-sample log-rank test require and benefit from a proper behaviour of the underlying simple one-sample log-rank test as they rely on the concordance of the distribution of the test statistics with the normal distribution for even smaller quantiles.
One should note that the computations which are necessary for our suggestions are neither computationally intensive nor do they need any additional information than is already requested from commercial software as PASS  or nQuery . Hence, existing tools for planning and execution of the one-sample log-rank test can easily be extended to incorporate any approach fitting into our framework.
The paper is organized as follows. We start in Section 2 with settling basic notation and revisiting existing methods. We will use this in Section 3 to construct our framework. Afterwards, we present power and sample size calculations therefor. In Section 5 we derive the uncorrelated variance estimator which also qualifies for our framework. Existing approaches and their properties for small sample sizes are compared by several simulations in Section 6 and we conduct a real data example in Section 7 in order to give an example for an application of the procedure. We conclude with a discussion in which we try to conclude with some advice for planning and execution of one-sample log-rank tests.
2 Definitions and preliminary considerations
be the space upon which all random variables are defined. In a study withsubjects, and denote the survival and censoring time of the -th subject. Additionally, denotes the recruitment calendar time of the -th patient. In what follows, it will be important to distinguish between censoring that occurs at a given date of analysis and additional random dropouts. The former one is given by for any analysis date while represent only the latter one. We assume and to be mutually independent for any and the tuples to be independent and identically distributed. Of course, at calendar time , only will be observable for any . Let and denote distribution function, density, survival function, hazard and cumulative hazard of the survival time random variable. Analogously, and resp. and denote the same entities for the censoring and recruitment random variable. In any of the three cases the well-known dependencies
are given. In the statistical testing framework, one naturally deals with two different probability distributionsand on the measurable space
which characterize the null resp. the alternative planning hypothesis. If distributions of certain random variables are different under the two probability distributions, the index of the functions refering to the time-to-event variable under investigation will also show if it is the function under the null or an alternative hypothesis. Testing problems can now be defined by the two-sided hypothesis
which is equal to the intersection of the the two one-sided hypotheses
In the formulation of the hypotheses, denotes the cumulative hazard function of the survival time rando variable under whereas the trial will be planned under the distribution of the same varible under which can be characterized by .
The number of events observed after calendar time given for any is given by
The number of expected events under the null hypothesis given for any is given by
It is uniquely characterized by . We define additionally
The following considerations also refer to the null distribution . Obviously, is a martingale under the null hypothesis w.r.t. the filtration generated by the survival processes of the patients. It follows from Theorem II.5.1 of Andersen et al. that this martingale converges in distribution against a continuous Gaussian martingale as the number of patients converges to infinity. The same theorem also leaves us with two possible choices to estimate the variance of this process because
as well as
where is the non-decreasing covariance function of the limiting process.
If one fixes an analysis date , both of the trivial choices
lead to asymptotically correct tests when choosing decision bounds according to a standard normal distribution. The latter one corresponds to the choice which has been the historical cornerstone for the one-sample log-rank test . As already mentioned, some problems now occur, especially for small sample sizes. There have been some attempts to solve these issues [3, 4, 6, 17]. In one of them  the test statistic
has been suggested. The asymptotical correctness of this approach can even be generalized as
for any Of course, this value does not need to be the same for any . We will try to exploit this by choosing an appropriate weight and make suggestions how to alter this weight, depending on the censoring mechanism and the timing of the analysis.
After briefly summarizing the mathematical foundation of this concept, we will address the sample size and power calculations for this approach. Afterwards, we suggest a concrete weight function which is motivated by introducing an estimator of which is uncorrelated with
and evalaute the performance of different approaches w.r.t. empirical type I and type II errors in a simulation study.
3 Theoretical foundations
Let for any the tuples of random variables be defined as in the previous section as well as the functions characterizing their distributions under different hypotheses. Let be the filtration comprising all the information availaible at calendar time , i.e. for any , the -algebra is generated by the random variables
for all . We know that for any
and the summed process
are -martingales where is the cumulative hazard function of the survival time random variables . In case of a continuous compensator and a converging accrual process for , we can apply Theorem II.5.1 of Andersen et al. to show that for any analysis calendar date we have the convergence
where and is the covariance function of this Gaussian process. Here, we have
where is identically distributed to for any . In order to construct a statistical testing procedure we need to standardise (14) and hence to estimate . Conveniently, the same theorem yields the asymptotical results
Hence, for any we have
So after Slutsky’s theorem we have
for any where . Nevertheless, it is necessary to choose the weights . Otherwise there is a non-zero probability of being negative, yielding an undefined test statistic. As we will see later on, it is advisable to choose the weight depending on the time of analysis if the accrual mechanism and the mechanism of random droputs are assumed. This leads us to a function . Building on that we can also define
It should be emphasised here that the functional form of must be determined in advance and must not be changed in the course of the study. Under this condition, distributional convergence is guaranteed pointwise in according to formula (19) independent from the specific choice of . By choosing a suitable function , small sample properties can be improved (see Section 5). Of course, it is easy to embed the original choice and Wu’s suggestion into this framework as we have resp. in these cases which lead to
Anyway, for any
we now obtain an almost surely well-defined, asymptotically correct two-sided test with type I error levelof the hypothesis by rejecting it if
where is defined as in (6) resp. (5) with the cumulative hazard function under the null hypothesis plugged in. Analogously one obtains a one-sided test with type I error level of the null hypothesis by rejecting it if
4 Power and sample size calculation
Fortunately, power and sample size calculation for our general approach can be adopted from power and sample size calculations from the specific choice . So that our notation fits into this work, we need to define for a previously fixed analysis date . By the assumed independence of and , we have
for any . To avoid confusion we use similar naming convention in this section as in the original work . Hence, under a fixed planning alternative which can be characterized (among others) by the density or the survival function of the survival random variable we have:
The expectation and variance of under the alternative hypothesis now amount to
while for our variance estimator with the prefixed weight we have
under the alternative hypothesis . Now, it follows from Slutsky’s theorem that
under . Given these quantities, we obtain a sample size of
for a two-sided test with nominal type I error level and power where denotes the -quantile of a standard normal distribution. We already recognize, that a decrease of in (30) leads to a decrease of the sample size required for fixed type I & II errors. As typically for all , we have . In terms of (28) it seems to be advisable to choose . For the reasons explained in Section 5 this deteriorates the testing procedure in terms of the type I error for small sample sizes. Anyhow, our suggestion presented in the next section tries to circumvent one of these issues while attributing a positive value to to increase the power.
Please also note that (30) is only an implicit formula if the accrual rate is the quantity which is given externally in the planning stage of a trial. Nevertheless, one can use this formula and standard numerical methods to solve this in terms of the accrual duration .
5 Uncorrelated variance estimator for the one sample log-rank test
An obvious problem of the test statistics from (9) is the correlation of the numerator and the denominator, which is the square root of the variance estimator of the former. This dependence structure is one of the causes for the skewedness of the distribution of the test statistic under the null hypothesis. This problem has been observed for several cases in which numerator an denominator themselves are symmetrically distributed while their correlation causes a skewedness of the ratio [18, 19, 20]
. In particular, positive correlation causes a left-skew and negative correlation causes a right-skew of the emerging distribution. This causes an increase of the weight on the left tail and a decrease of weight on the right tail in the former case while in the latter case it is just the other way round. Because of this, it is important to not just examine empirical type I error levels of two-sided tests in simulation studies, but also to consider how this empirical level I error is distributed among the two underlying one-sided tests.
Of course, this is not the only problem when evaluating the concordance of the distribution of the test statistic of the one-sample log-rank test under the null hypothesis with the normal distribution for small sample sizes. Another problem is given by the skewedness of the numerator itself which is not treated here. In what follows, we will try to solve the problem of correlation of nominator and denominator.
5.1 General considerations
As for any in any trial where there is a.s. some patient with a positive length of stay, there is a s.t.
We will see later that for any for this choice. Together with (11) this yields a consistent variance estimator which is uncorrelated with the martingale in the numerator of the test statistics. As we will see in the simulations scenarios, the choice is a good first choice w.r.t. a decrease of the correlation of numerator and denominator, but it is not difficult to improve the choice in this regard without major disadvantages concerning the power of the trial.
We are looking for a weight s.t.
Obviously, this is equivalent to
The right hand side of this equations can be rewritten using already derived quantities . Firstly, we have
Secondly, we have
where denotes the first summand of . The weight is thus obtained by
One should note, that the analysis date also plays a role in as we can see from (25). For this calculation, we require no further assumptions than those needed for sample size caculation anyway . Also, as lined out in Section 3, misspecifications of accrual or censoring mechanisms in this calculation do not affect the asymptotical properties of the test. With (37), we can already see that for this choice. Also, after two partial integrations, we obtain
from which we can see that . In the following subsection, we will elaborate this choice of the weight for some situations explicitly.
5.2 Trials with simultaneous entry of patients
In the virtual case of a trial with simultaneous entry and fixed follow-up (until calendar time ) of patients and without random dropouts, the weight does only depend on and it now amounts to
With (1) we can see that in this case
and that the function, whose derivative is given by
is strictly monotonous increasing in if the distribution has full support on as . Hence the weight for the variance estimator is shifted from the compensator to the counting process for increasing length of the follow-up period. This shift is continuous if the distribution of is absolutely continuous w.r.t. the Lebesgue measure on Because of this continuity there must obviously be a case in which the choice of is equal to the choice resulting from our calculations. This is exactly the case if
where denotes the -th branch of the Lambert function. Hence, this choice approximately corresponds to our suggestion if about three fourths of possible events can be observed. In this view, one should note, that the numerical examples given in previous publications about the one-sample log-rank test all deal with cases in which only for less than half of the study cohort the event under consideration is observed. [2, 3, 4, 6, 17]
5.3 Trials with staggered entry of patients
Commonly, we are given an accrual period of length and a subsequent follow-up period of length , i.e. the final analysis is conducted at calendar time . As in common practice, we assume that the patients are recruited uniformly over the interval . Hence, and as we assume again that there are no further random dropouts . For a given accrual duration the weight function amounts to
For any fixed , one can show that is monotonously increasing in and converging to for if the distribution of has full support on and is absolutely continuous w.r.t. the Lebesgue measure. For , it holds .
In order to illustrate the change of weight, we show some plots of for different accrual period lengths in Figure 1. In the underlying scenario,
is exponentially distributed with parameterunder the null hypothesis, s.t. the median survival time is year.
6 Simulation study
In this section we want to shed a light on the differences between our suggested approach from Section 5 and the other approaches fitting into our framework presented in Section 3 on three different levels. At first, we study the correlation of and the variance estimator and its impact on the skewness of the resulting distribution. Afterwards, we consider at a fixed survival scenario in which the follow-up time is altered to get an impression on how the performance of the different approaches change with the follow-up time. Finally, we take a look at sample sizes and empirical errors for a wide range of scenarios.
As already explained, a skewed distribution of the test statistic implies a lack of concordance with the normal distribution on both tails. Nevertheless, it is possible that the deviations from the nominal level on both tails cancel each other out and it may seem that the empirical error level is close to the nominal level, although the test may be misbehaving at both tails. An example for such a behaviour can be found in Section 7. Therefore we primarily focus on the left tail and report empirical errors of the left-sided test, whose rejection would result in the acceptancce of the superiority of the new treatment considered in this analysis. As we naturally carry out two-sided tests with a nominal type I error level of , we consider the left-sided tests with a nominal level of in what follows. All simulations were performed using R, version 4.0.2 .
6.1 Correlation of the unstandardised test statistics and its variance estimators
At first we want to illustrate the problems lined out in the introduction of section 5. In the scenario used here, is exponentially distributed with parameter s.t. the median survival time is 2 years. For and , (43) yields a weight of approximately . As we can see in the first row of Figure 2, there is an obvious correlation of as defined in (14) and the variance estimator following the original approach (as in (21)) or Wu’s approach (as in (22)). The empirical correlation in our simulation with 100 000 runs and patients amounts to -0.908 resp. 0.591 while it is only -0.002 for our suggested approach. The resulting skew w.r.t. the normal distribution can be seen in the QQ-plots in the second row of Figure 2. As mentioned before, a negative correlation (as with the original approach) leads to a right skew while a positive correlation (as with Wu’s approach) leads to a right skew. And while the empirical type I errors in terms of the two-sided test look good for all of the three approaches (5.133% resp. 5.048% resp. 4.997%), there is a noteworthy imbalance between the empirical errors of the two one-sided tests for the first two approaches as the empirical type I errors for the left-sided test amount to 1.823% resp. 2.856% resp. 2.562%.
6.2 Variation of follow-up length
Of course, not only our proposed weights, but also the properties of the unnormalised martingale depend on the length of the follow-up. For very small sample sizes and either very long or very short follow-up times, the distribution of for a fixed is already skewed and our proposed standardisation procedure is not able to remove this skew. In Figure 3 we compared the four different approaches fitting into our framework developed in Section 3 concerning their empirical type I error for a left-sided test of level . The underlying scenario is the same as above: is exponentially distributed with parameter and the patients are recruited uniformly over 1 year. While the original approach overestimates the type I error for any of the considered follow-up times, Wu’s approach underestimates it. Our suggestion performs quite well in an area in which about one to two thirds of the patients experience an event, whereas Wu’s approach appears to be superior in scenarios with larger event rates.
As we can see from Figure 3, the empirical type I errors of our approach shows a slight deviation from the desired nominal type I error in some cases too. This can be attributed to the skewedness of the distribution of
for small sample sizes as one can see from the curve for the empirical type I error in case of a standardisation with the true underlying standard deviation. In general this is not advisable as it cannot be specified without uncertainty (see Section8). Nevertheless, in a range of practitally relevant scenarios, our suggested approach works quite well. All in all, our results suggest that a combination of Wu’s and our variance estimator promises a nearly optimal performance concerning the type I error rate, independent from the event rate. Such a combination corresponds to choosing the weight according to
The share of patients with events expected under the null hypothesis until the different follow-up times can be seen under the y-axis.
6.3 Power and sample size
In order to compare the performance of the different approaches concerning sample size, empirical type I and type II error, we conducted a simulation study. To ensure comparability to already existing literature on this topic, we considered scenarios inspired by previous simulation studies [2, 6].
Hence, the survival distribution under the null hyothesis is taken as a Weibull distribution with distribution function and cumulative hazard function where the shape parameter and the median survival time are given. We assume that the survival time under the alternative is also given by a Weibull distribution with the same shape parameter and a different median survival time , which is determined by the hazard ratio through . The censoring mechanism has been implemented as in previous simulations with and
We will investigate shape parameters , but different from other publications [2, 6], we will not only consider the fixed mean survival time , but values of . On the other hand, we restrict the possible values of to one small (1.2), one medium (1.5) and one large (2) hazard ratio.
The reason for extending the range of median survival times under the null hypothesis can be seen in Table 1. As the first row indicates, leads to trials with high event rates. In each of these scenarios event rates of more than 50% are expected and two of the six scenarios lead to event rates of more than 90%. By including larger median event times, we want to broaden the range of expected events under the null hypothesis. With this we can clearly distinguish which approach is most useful in which setting.
As already mentioned we focused on the empirical errors of the one-sided tests with nominal level of . The sample size was planned with a power of . The results can be found in Table 2. We conducted 100 000 simulation runs for each scenario.
As already seen in the previous subsection, our approach works quite well for small and medium event rates and is the best choice concerning the absolute deviance from the nominal type I error level in most of the cases except from some special cases. The first exception concerns scenarios with high event rates which lead to weights greater than . One can see from Table 1 to which scenarios this apples. Please remind that the computed weight does not depend on the hazard ratio used to plan the trial. Here, the type I error inflation for the uncorrelated variance estimation exceeds the error inflation for Wu’s suggestion. The other exception concerns scenarios with low hazard ratios (which lead to high sample sizes) and rather small event rates. This is the case for scenarios with , and . In these cases the original version of the one-sample log-rank test performs best in means of absolute deviation from the nominal type I error.
As one can already see from the flexible sample size formula (30), the approach using only the counting process to estimate the variance requires the smallest sample sizes. Nevertheless the type I error is inflated in any scenario, ranging from 2.82% to 5.16%. The original variance estimation just behaves the other way round. Obviously, the required sample size is always the highest, while the type I error is always deflated. It ranges between 1.57% and 2.4%, depending on the scenario. The remaining approaches are located in between concerning the sample size whereby our proposed approach requires higher sample sizes if and only if the weight (see Table 1) is smaller than .
Concerning the compliance with the given type II error rate, the uncorrelated variance estimation works best on average with empirical values lying between 79.7% and 81.7%. Here, also the highest deviations occur for high event rates, i.e. in scenarios in which Wu’s approach is also superior concerning the empirical type I error. These results confirm that a combination of Wu’s and our variance estimator, as given in (44), promises the most satisfying performance.
7 Practical example
We illustrate the differences of the approaches using a practical example. We employ the setting of the Mayo Clinic trial in primary biliary cirrhosis of the liver (PBC), which is a rare but fatal chronic disease whose cause is still unknwon . In this double-blinded randomized trial the drug D-penicillamine (DPCA) was compared with a placebo. The study data is publicly available via the survival package in R [21, 23].
Among the 158 patients of the cohort treated with DPCA, 65 died during the trial. For the sake of comparability, we adopt the previous parameter estimation of their survival curve , which states that a Weibull distribution with shape parameter and median survival time fits the data well. We now suppose, that a new treatment becomes available and the data from this trial shall be used to compare the survival under this treatment to the survival under treatment with DPCA. This shall be accomplished in a trial in which patients are recruited uniformly over a accrual period of length and followed-up in an additional period of length . As in the preceding simulations, the planning hypothesis is supposed to fulfill the proportional hazards assumption and hence for any . A hazard ratio of shall be detected with a power of via a two-sided test with significance level
The equation (43) yields a weight of 0.1923 for this scenario and hence, our suggested test statistic amounts to
The results of a simulation with 100 000 runs, which are displayed in Table 3 show that in this real world example, our proposed approach is closest to the nominal type I error level in terms of empirical type I error of the two-sided test as well as the left-sided test. Nevertheless, the original one-sample log-rank test and Wu’s suggestion look similar in terms of the former while they perform remarkably worse in terms of the latter. The phenomenon of unbalanced left- and right-sided type I errors which cancel each other out quite well in their sum is remarkable here.
Although the sample size for our suggested approach is about 10% higher than for Wu’s suggestion, there is still a saving in sample size compared to the standard approach.
We introduced a simple but extensive framework, enabling a continuum of consistent variance estimators referring to the one-sample log-rank test. Asymptotical correctness and asymptotically correct power and sample size calculations are provided. The classical one-sample log-rank test  as well as a practical alternative  naturally fit into it. Please note that one could still extend the options to estimate the variance and also use itself in its estimation as it is explicitly given in (16) if the accrual mechanism and the distribution of the ‘s is known. This would yield
for any . But it is important to note that a misspecification of the accrual mechanism and possible additional random drop-outs would lead to a wrong value s.t. the values of on the left and right hand side of (46) do not coincide and the asymptotics no longer applies. Therefore, we do not pursue this approach any further, but rather focus on the choice made in (18).
In addition, we elaborated only one special choice for the weight function which guarantees that the variance estimator is uncorrelated to the compensated counting process under the null hypothesis. In several simulations and in an example based on real world data, we can see that the emerging test is superior to other approaches concerning the adherence to the nominal type I error level. This superiority is most remarkable in small sized trials with small to medium event rates.
Nevertheless, we saw in our simulation studies that Wu’s suggested weight works better than the uncorrelated variance estimation in scenarios with high event rates. To prevent a remarkable anti-conservativeness in such a case, one could also imagine to choose the weight according to (44).
One can also conduct multiple simulations to find the perfect weight for the envisaged scenario which can cancel out a possible skewness of under the null hypothesis. Once the weight is determined this way, the theory from Section 3 provides the asymptotical correctness and sample size calculation can be done as lined out in Section 4. To avoid anti-conservativeness one could also execute an exact calculation of the third moment of under the null hypothesis and only use the uncorrelated estimation if it is positive. This should prevent from any left-skew which causes anti-conservativeness on the left hand side, but still ensure a sample size reduction compared to the classical one-sample log-rank test. If the accrual and censoring mechanism are again given via a uniform accrual during a period of length and a subsequent follow-up period of length , the third moment of under the null hypothesis is given by
A more cautious planing could also incorporate the consideration of additional random dropouts. The distribution function of the overall censoring variable is given by at analysis date under the assumptions of independence of and . This could in turn be plugged into (38) and would in most cases lead to a lower weight for the counting process and hence a more conservative approach concerning the distriution of the test statistic.
In conclusion, our framework yields a solid foundation for such and possible further considerations. This includes extensions to multi-stage [10, 11, 12], multivariate  and other variations  of the classical one-sample log-rank test.
The work of the corresponding author was supported by the German Science Foundation (DFG, grant number 413730122).
-  Ivanova A et al. Nine-year change in statistical design, profile, and success rates of Phase II oncology trials. Journal of Biopharmaceutical Statistics 2016; 26(1): 141–149.
-  Wu J. Sample size calculation for the one-sample log-rank test. Pharmaceutical Statistics 2015; 14(1): 26–33.
-  Tu D and Gross AJ. A Bartlett-type correction for the subject-years method in comparing survival data to a standard population. Statistics & Probability Letters 1996; 29(2): 149–157.
-  Sun X, Peng P and Tu D. Phase ii cancer clinical trials with a one-sample log-rank test and its corrections based on the edgeworth expansion. Contemporary clinical trials 2011; 32(1): 108–113.
-  Wu J. Single-arm Phase II cancer survival trial designs. Journal of Biopharmaceutical Statistics 2016; 26(4): 644–656.
-  Wu J. A new one-sample log-rank test. Journal of Biometrics and Biostatistics 2014; 5(4): 1–5.
-  Basu D. On Statistics Independent of a Complete Sufficient Statistic. Sankhyā: The Indian Journal of Statistics (1933-1960) 1955; 15(4): 377–380.
-  Andersen PK et al. Statistical Models Based on Counting Processes Springer, 1993.
-  Schmidt R, Faldum A and Kwiecien R. Adaptive designs for the one-sample log-rank test. Biometrics 2018; 74(2): 529–537.
-  Shan G and Zhang H. Two-stage optimal designs with survival endpoint when the follow-up time is restricted. BMC medical research methodology 2019; 19(1): 74.
-  Belin L, De Rycke Y and Broet P. A two-stage design for phase II trials with time-to-event endpoint using restricted follow-up. Contemporary Clinical Trials Communications 2017; 8: 127–134.
-  Kwak M and Jung S. Phase II clinical trials with time-to-event endpoints: Optimal two-stage designs with one-sample log-rank test. Statistics in Medicine 2014; 33(12): 2004–2016.
-  Danzer MF, Terzer T, Berthold F, Faldum A and Schmidt R. Confirmatory adaptive group sequential designs for single-arm phase II studies with multiple time-to-event endpoints. Biometrical Journal 2021; DOI:10.1002/bimj.202000205.
-  PASS 16. Power and Sample Size Software NCSS, LLC. Kaysville, Utah, USA, 2018. ncss.com/software/pass.
-  nQuery. Sample Size and Power Calculation ”Statsols” (Statistical Solutions Ltd.), Cork, Ireland, 2017. statsols.com/nquery.
-  Breslow N. Analysis of survival data under the proportional hazards model. International Statatistical Review 1975; 43: 45–48.
-  Kerschke L, Faldum A and Schmidt R. An improved one-sample log-rank test. Statistical Methods in Medical Research 2020; 29(10): 2814–2829.
-  Hinkley DV. On the Ratio of Two Correlated Normal Random Variables. Biometrika 1969; 56(3): 635–639.
On the ratio x/y for some elliptically symmetric distributions.
Journal of Multivariate Analysis2006; 97(2): 342–358.
-  Ly S, Pho KH, Ly S and Wong WK. Determining distribution for the quotients of dependent and independent random variables by using copulas. Journal of Risk and Financial Management 2019; 12(1): 42.
-  R Core Team. R: A language and environment for statistical computing R Foundation for Statistical Computing, Vienna, Austria, 2020. https://www.R-project.org/.
-  Fleming TR and Harrington DP. Counting Processes and Survival Analysis Wiley, 2011.
-  Therneau TM. A Package for Survival Analysis in R R package version 3.2-7, 2020.
-  Chu C, Liu S and Rong A. Study design of single-arm phase II immunotherapy trials with long-term survivors and random delayed treatment effect. Pharmaceutical Statistics 2020; 19: 358–369.