1 Introduction
In many clinical studies, two or more endpoints are investigated as coprimary, with the aim of providing a comprehensive picture of the treatment’s benefits and harms. The study on the time until one relevant event has often been the sharp focus of clinical trial research. However, when there is more than one event of interest, the time until the appearance of the event is not always the center of attention for all endpoints while the occurrence of an event over a fixed time period is often the outcome of interest for some of them.
One example, which motivates this paper, is in the context of cancer immunotherapies trials where shortterm binary endpoints based on the tumor size, such as objective response, are common in earlyphase trials, whereas overall survival remains the gold standard in latephase trials ((Wilson et al., 2015, Ananthakrishnan and Menon, 2013)). Since traditional oncology endpoints may not capture the clinical benefit of cancer immunotherapies, the idea of looking at both tumor response and survival has grown from the belief that together may achieve a better characterization of the clinical response ((Thall, 2008)).
The problem of how to analyze multiple outcomes has been widely discussed in the literature. Despite how to construct a test of individual hypothesis is well established, how to combine them when testing multiple hypotheses is often difficult. If one ignores the multiplicity when more than one null hypothesis are tested simultaneously, and test each hypothesis at level
, the probability of one or more false rejection generally increases with the number of hypotheses and may be much greater than
. The classical approach is to restrict the attention to multiple testing procedures that control the probability of one or more false rejections, the socalled familywise error rate, which guarantee the nominal significance level ((Lehmann and Romano, 2012)).The most used multiple testing procedures are those based on correcting the significance level to control the prespecified nominal level (e.g, Bonferroni procedure ((Bland and Altman, 1995))), which only require to test the individual hypotheses and thus are straightforward to apply. However, many of such approaches may lead to conservative designs since they do not take the potential correlation between the statistics into account.
Other alternative approaches have been developed. Hothorn et al. ((2008))
extend the linear models theory to multiple comparisons within parametric and semiparametric models. Their approach corrects the significance level by means of the simultaneous asymptotic normality of the commonly used
statistics. Pipper et al. ((2012))established the asymptotic joint distribution of the
statistics for the effect of a covariate from models for different endpoints. Pocock et al. ((1987))derived a global test statistic by combining asymptotically normal test statistics through the sum of them. All of these approaches require knowledge of the multivariate distribution of test statistics. However, the asymptotic distribution of the statistics might be hard to obtain, specially when there are different types of endpoints.
Within the context of cancer trials, several authors have considered both objective response and overall survival as coprimary endpoints. Lai and Zee ((2015)) proposed a singlearm phase II trial design with tumor response rate and a timetoevent outcome, such as overall survival or progression free survival. In their design, the dependence between the probability of response and the timetoevent outcome is modeled through a Gaussian copula. Lai et al. ((2012)) proposed a twostep sequential design in which the response rate and the time to the event are jointly modeled. Their approach relates the response rate and the time to the event by means of a mixture model and is build on the basis of the Cox proportional hazards model assumption.
Another characteristic of the immunotherapy trials is that delayed effects are likely to be found, bringing the additional challenge of the nonproportionality of the hazards into the statistical analysis ((Mick and Chen, 2015)). Statistics that look at differences between integrated weighted survival curves, such as those defined by Pepe and Fleming ((1989, 1991)) and extended by Gu et al. ((1999)) are better suited to detect early or late survival differences and do not depend on the proportional hazards assumption.
In this paper, we have followed the idea launched by Pocock et al. ((1987)) of combining multiple test statistics into a single hypothesis test. Specifically, we propose a general class of statistics based on a weighted sum of a difference in proportions test and a weighted KaplanMeier testbased for the difference of survival functions. Our proposal adds versatility in the study design by enabling different followup periods in each endpoint, and flexibility by incorporating weights. We define these weights to specify unequal priorities to the different endpoints and to anticipate the type of timetoevent difference to be detected. The proposed class of statistics could be used in seamless phase II/III design, to jointly evaluate the efficacy on binary and survival endpoints, and even in the presence of delayed treatment effects.
This article is organized as follows. In Section 2 we present the class of statistics for binary and timetoevent outcomes. In Section 3 we set out the assumptions and present the large sample distribution theory for the proposed statistics. In Section 4 we introduce different weights and discuss about their choice. We give an overview of our R package SurvBin in Section 5 and illustrate our proposal with a recent immunotherapy trial in Section 6. In Section 7 we evaluate the performance of these statistics in terms of the significance level with a simulation study. We conclude with a discussion.
All the required functions to use these statistics have been implemented in R and have been made available at: https://github.com/MartaBofillRoig/SurvBin.
2 A general class of binary and survival test statistics
Consider a study comparing two groups, control group () and intervention group (), each composed of individuals, and denote by the total sample size. Suppose that both groups are followed over the time interval and are compared on the basis of the following two endpoints: the occurrence of an event before (), and the time to a different event within the interval (). For the th group (), let be the probability of having the event before , and be the survival function of the time to the event .
We consider the problem of testing simultaneously : and : , aiming to demonstrate either a higher probability of the occurrence of or an improved survival with respect to in the intervention group. The hypothesis problem can then be formalized as:
(1) 
We propose a class of statistics –hereafter called class– as a weighted linear combination of the difference of proportions statistic for the binary outcome and the integrated weighted difference of two survival functions for the timetoevent outcome, as follows,
(2) 
for some real numbers , such that , and where:
(3)  
denoting by the estimated proportion of events before , and by the KaplanMeier estimator of for group . The estimates and are such that converge in probability to and , respectively, as , where and represent the variances of and , respectively. Both theoretical and estimated expressions for the variances of and will be given in Section 3 (see equations (7, 8) for the theoretical expressions and (12, 13) for the estimates). The term is a possibly random function which converges pointwise in probability to a deterministic function . For ease of notation, and letting , we will suppress the dependence on and use instead , , . Note that , , and depend on the sample size , but it has been omitted in notation for short.
The weights control the relative relevance of each outcome if any and the random weight function serves two purposes: to specify the type of survival differences that may exist between groups and to stabilize the variance of the difference of the two KaplanMeier functions. Some wellknown special cases of are:

, where is the pooled KaplanMeier estimator for the censoring distribution. This choice of downweights the contributions on those times where the censoring is heavy.

, where and is the pooled KaplanMeier estimator for the survival function. This corresponds to the weights of the family ((Fleming and Harrington, 1991)). Then, for instance, if and , emphasizes early differences between survival functions; whereas late differences could be highlighted with and .

, where denotes the number of individuals at risk of at time . In this case accentuates the information at the beginning of the survival curve allowing early failures to receive more weight than later failures.
We state the precise conditions for the weight function in Section 3 and postpone the discussion about the choice of and to Section 4.
The statistics in the class are defined for different followup configurations based on different choices of: the overall followup period, ; the time where the binary event is evaluated, ; and the origin time for the survival outcome, ; taking into account that . There are however no restrictions on whether or not these periods overlap and, if they do, how much and when. We illustrate two different situations with different configurations for in Figure 1. The first case is exemplified by an HIV therapeutic vaccination study where safetytolerability response (binary outcome) and timetoviral rebound (survival outcome) are outcomes of interest. Whereas the safetytolerability is evaluated at week 6 (), the timetoviral rebound is evaluated from week 6 to 18 ( and ) ((De Jong et al., 2019)). The second example in the area of immunotherapy trials includes a binary outcome (objective response), evaluated at month 6, and overall survival, evaluated from randomization until year 4 (, and ) ((Hodi et al., 2019)).
The class statistics includes several statistical tests. If , and , then, corresponds to the global test statistic proposed by Pocock et al. ((1987)). If , , and , the statistic is the equivalent of the linear combination test of Logan et al. ((2008)) when there is no censorship until for testing for differences in survival curves after a prespecified timepoint.
3 Large sample results
In this section, we derive the asymptotic distribution of the class of statistics given in (2) under the null hypothesis and under contiguous alternatives, present an estimator of their asymptotic variance, and discuss the consistency of the statistics against any alternative hypothesis of the form of in (1). We start the section with the conditions we require for the class of statistics. In order to make the paper more concise and more readable, proofs and technical details are in the Appendix and Supplementary material.
3.1 Further notation and Assumptions
We consider two independent random samples of () individuals and for each we denote the binary response by has occurred, the time to by and the censoring time by for and where is the usual 0/1 indicator function. Assuming that is noninformatively rightcensored by , the observable data are summarized by , where and . Suppose as well that is independent of and that the occurrence of the survival and censoring times, and , does not prevent to assess the binary response, .
Denote by and the censoring survival function and the KaplanMeier estimator for the censoring times, respectively. As we will see in the next section, the distribution of the statistics relies, among others, on the survival function for those patients who respond to the binary endpoint. We then introduce here the survival function for responders as P .
Furthermore we assume that: (i) , and ; (ii) the limiting fraction of the total sample size is nonnegligible, i.e., ; and (iii) is a nonnegative piecewise continuous with finitely discontinuity points. For all the continuity points in , converges in probability to as . Moreover, and are functions of total variation bounded in probability.
Finally, we introduce the counting process as the number of observed events that have occurred by time t for the th group () and as the number of subjects at risk at time for the th group. We define and suppose that .
Remark: Throughout the paper and to refer to the group (), we will use subindexes for the individual observations and stochastic processes, as in , while we will use superindexes in parentheses for the functions and parameters, as in .
3.2 Asymptotic distribution
In order to derive the asymptotic distribution of the statistic , we first note that can be approximated by , the same statistic with the weights replaced by its deterministic function.
Lemma 3.1.
Let be the statistic defined by:
(5) 
where is the statistic given in (3) and is the statistic given in (2) with replaced by , that is:
for some real numbers , such that , and for a function satisfying the conditions outlined in Section 3.1. Then, the statistic , given in (2), can be written as:
where
converges in probability to . Hence, the asymptotic distribution of the statistic is the same as that of .
Roughly speaking, thanks to this theorem we can ignore the randomness of and use to obtain the limiting distribution of . In what follows, we state the asymptotic distributions under the null hypothesis in Theorem 3.2 and under a sequence of contiguous alternatives in Theorem 3.3.
Theorem 3.2.
Let be the statistic defined in (2). Under the conditions outlined in 3.1, if the null hypothesis holds, converges in distribution, as
, to a normal distribution as follows:
where , stand for the variances of and , respectively, and is the covariance between and . Their corresponding expressions are given by:
(7)  
(8)  
where
,
( or ), , and
for .
Recall that , , and depend on , but we omit them for notational simplicity.
Theorem 3.3.
Let be the statistic defined in (2). Under the conditions outlined in 3.1., consider the following sequences of contiguous alternatives for both binary and timetoevent hypotheses satisfying, as :
and
for some constant and bounded function , and . Then, under contiguous alternatives of the form:
and  (10) 
we have that:
in distribution as , where , and are given in (7), (8) and (LABEL:sigbs), respectively.
The covariance in (LABEL:sigbs) involves the conditional probabilities and , while represents the survival function for responders –patients that have had the binary event –, stands for the probability of being a responder among patients experiencing at . Also note that, if , the survival experience starts after the binary event has been evaluated and only involves the second integral in (LABEL:sigbs).
We notice that the efficiency of the statistics, , under contiguous alternatives is driven by the noncentrality parameter , that is, by the sum of the weighted noncentrality parameters of and .
3.3 Variance estimation and consistency
We now describe how to use the statistics to test versus given in (1). Theorem 3.4 gives a consistent estimator of the asymptotic variance of , and Theorem 3.5 presents the standardized statistics to test .
Theorem 3.4.
Let be the statistic defined in (2), and let , and be the variances and covariance given in (7), (8) and (LABEL:sigbs), respectively. The asymptotic variance of , given in Theorem 3.2, can be consistently estimated by:
(11) 
where , , and denote the estimates of , and , and are given by:
(12)  
(13)  
where ( or ), is the KaplanMeier estimator of ; and is the estimator of . Kerneldensity methods are used in the estimation of .
Theorem 3.5.
Let be the statistic defined in (2), and let be the variance estimator given in (11). Consider the global null hypothesis ((1)) and let the normalized statistic of be:
(15) 
Then, the statistic defined in (15) converges in distribution to a standard normal distribution. Moreover, for positive , the statistic is consistent against any alternative hypothesis of the form of in (1) which contemplate differences and stochastic ordering alternatives for the binary and timetoevent outcomes, respectively.
We are presenting here a pooled variance estimator of . An unpooled variance estimator is proposed in Theorem A.1 in the Appendix.
Theorem 3.5 can be used to test the global null hypothesis by comparing to a standard normal distribution.
4 On the choice of weights
An important consideration when applying the statistics proposed in this paper is the choice of the weight functions. The class of statistics involves the already mentioned random weight function and deterministic weights . These weights are defined according to different purposes and have different roles into the statistic . In this section, we include different weights and discuss some of their strengths as well as shortcomings. The list provided is not exhaustive, other weights are possible and might be useful in specific circumstances.
4.1 Choice of
The purpose of the weights is to prioritize the binary and the timetoevent outcomes. They have to be specified in advance according to the research questions. Whenever the two outcomes are equally relevant, we should choose . In this case the statistics will be optimal whenever the standardized effects on both outcomes coincide.
4.2 Choice of
The choice of might be very general as long as converges in probability to a function , and both and satisfy the conditions outlined in 3.1. In this section, we center our attention on a family of weights of the form:
where: (i) is a datadependent function that converges, in probability to , a nonnegative piecewise continuous function with bounded variation on . The term takes care of the expected differences between survival functions and can be used as well to emphasize some parts of the followup according to the timepoints (); (ii) the weights converge in probability to a deterministic positive bounded weight function . The main purpose of the weights is to ensure the stability of the variance of the difference of the two KaplanMeier functions. To do so, we make the additional assumption that:
for all , and for some constants .
Different choices of yield other known statistics. For instance, if , corresponds to the Weighted KaplanMeier statistics ((Pepe and Fleming, 1989, 1991)). Whenever and correspond to the weights (17) and (16), respectively, introduced below, we have the statistic proposed by Shen and Cai ((2001)). Furthermore, note that the weight functions of the form are similar to those proposed by Shen and Cai ((2001)); while they assume that is a bounded continuous function, we assume that is a nonnegative piecewise continuous function with bounded variation on , and instead of only considering the Pepe and Fleming weight function corresponding to (17), we also allow for different weight functions . Finally, if the random quantity is omitted corresponds to the difference of restricted mean survival times from to .
In what follows, we outline different choices of and , together with a brief discussion for each one:

We require to be small towards the end of the observation period if censoring is heavy. The usual weight functions involve KaplanMeier estimators of the censoring survival functions. The most common weight functions are:
(16) and , both proposed by Pepe and Fleming. Among other properties, has been proved to be a competitor to the logrank test for the proportional hazards alternative ((Pepe and Fleming, 1989)). Note that if the censoring survival functions are equal for both groups and the sampling design is balanced (), then, the differences in KaplanMeier estimators are weighted by the censoring survival function, that is, for . Also note that for uncensored data.

Analogously to Fleming and Harrington ((1991)) statistics, could be used to specify the type of expected differences between survival functions. That is, if we set:
(17) the choice , leads to a test to detect early differences, while , leads to a test to detect late differences; and leads to a test evenly distributed over time and corresponds to the weight function of the logrank.

In order to put more emphasis on those times after the binary followup period we might consider:
for .
5 Implementation
We have developed the SurvBin package to facilitate the use of the statistics and is now available on GitHub. The SurvBin package contains three key functions: lstats to compute the standardized statistic, ; and bintest and survtest for the univariate binary and survival statistics (3) and (2), and , respectively. The SurvBin package also provides the functions survbinCov, that can be used to calculate ; and simsurvbin for simulating bivariate binary and survival data.
The main function lstats can be called by:
lstats(time, status, binary, treat, tau0, tau, taub, rho, gam, eta, wb, ws, var_est)
where time, status, binary and treat
are vectors of the rightcensored data, the status indicator, the binary data and the treatment group indicator, respectively;
tau0, tau, taub denote the followup configuration; wb, ws are the weights ; rho, gam, eta are scalar parameters that controls the weight which is given by ; and var_est indicates the variance estimate to use (pooled or unpooled).6 Examples
Melanoma has been considered a good target for immunotherapy and its treatment has been a key goal in recent years. Here we consider a randomized, doubleblind, phase III trial whose primary objective was to determine the safety and efficacy of the combination of a melanoma immunotherapy (gp100) together with an antibody vaccine (ipilimumab) in patients with previously treated metastatic melanoma ((Hodi et al., 2019)). Despite the original endpoint was objective response rate at week 12, it was amended to overall survival and then considered secondary endpoint. A total of 676 patients were randomly assigned to receive ipilimumab plus gp100, ipilimumab alone, or gp100 alone. The study was designed to have at least power to detect a difference in overall survival between the ipilimumabplusgp100 and gp100alone groups at a twosided level of , using a logrank test. Cox proportionalhazards models were used to estimate hazard ratios and to test their significance. The results showed that ipilimumab with gp100 improved overall survival as compared with gp100 alone in patients with metastatic melanoma. However, the treatment had a delayed effect and an overlap between the KaplanMeier curves was observed during the first six months. Hence, the proportional hazards assumption appeared to be no longer valid, and a different approach would had been advisable.
In order to illustrate our proposal, we consider the comparison between the ipilimumabplusgp100 and gp100alone groups based on the overall survival and objective response as coprimary endpoints of the study. For this purpose, we have reconstructed individual observed times by scanning the overall survival KaplanMeier curves reported in Figure 1A of Hodi et al. ((2019)) using the reconstructKM package ((Sun, 2020)) (see Figure 2), and, afterwards, we have simulated the binary response to mimic the percentage of responses obtained in the study.
Using the data obtained, we employ the statistic by means of the function lstats in the SurvBin package. To do so, we need to specify the weights () to be used, and the timepoints (). In our particular case, we take according to the trial design, choose to account for censoring and delayed effects in late times, and to emphasize the importance of overall survival over objective response. The results are summarized in Figure 2.
Since we obtained , we have a basis to reject and conclude that the ipilimumab either improved overall survival or increased the percentages of tumor reduction in patients with metastatic melanoma, or both.
7 Simulation study
7.1 Design
We have conducted a simulation study to evaluate our proposal in terms of typeI error. We have generated bivariate binary and timetoevent data through a copulabased framework and using conditional sampling as described in
((Trivedi and Zimmer, 2007)). The parameters used for the simulation (summarized in Table 1) have been the following: Frank’s copula with association parameter ; Weibull survival functions, , with and ; probability of having the binary endpoint ; and sample size per arm .The censoring distributions between groups were assumed equal and uniform with . Two different followup configurations were considered for : (i) ; and (ii) . We have considered the weights: with and such that , and equal to . For each scenario, we ran 1000 replicates and estimated the significance level ().
We note that the chosen values of the association parameter correspond to an increasing association between the binary and timetoevent outcomes. Indeed, the values are equivalent to , , in terms of Spearman’s rank correlation coefficient between the marginal distributions of the binary and timetoevent outcomes. We have not considered higher values of as they do not fulfill the condition that ().
We have performed all computations using the R software (Version 3.6.2), and on a computer with an Intel(R) Core(TM) i76700 CPU, 3.40 GHz, RAM 8.00GB, 64bit operating system. The time required to perform the considered simulations was 52 hours.
Parameter  Value  Parameter  Value 

()  
() 
7.2 Size properties
The empirical results show that the type I error is very close to the nominal level
across a broad range of situations. The empirical size resulted in type I errors with a median of 0.049 and first and third quartiles of 0.043 and 0.055, respectively. Table
2 summarizes the results according to the parameters of the simulation study. The results show that the statistics have the appropriate size and that are not specially influenced by the censoring distribution neither by the selection of weights (). Figure 3 displays how the empirical sizes behave according to the association parameter and the followup configuration . We observe that when the empirical size is slightly small than 0.05.We compare the performance of the pooled and unpooled variance estimation and notice that the empirical sizes do not substantially differ between them.
Variance estimator  
Pooled  Unpooled  
0.052  0.050  
0.046  0.047  
0.001  0.049  0.050  
2  0.048  0.048  
3  0.049  0.048  
0.2  0.049  0.049  
0.4  0.049  0.048  
0.5  0.052  0.050  
1  0.048  0.049  
2  0.046  0.047  
1  0.049  0.049  
3  0.049  0.048  
(0,1,0)  0.048  0.049  
(1,1,0)  0.049  0.049  
(0,0,1)  0.048  0.048  
(0,1,1)  0.050  0.049  
(1,1,1)  0.048  0.050 
8 Discussion
We have proposed a class of statistics for a twosample comparison based on two different outcomes: one dichotomous taking care, in most occasions, of short term effects, and a second one addressed to detect long term differences in a survival endpoint. Such statistics test the equality of proportions and the equality of survival functions. The approach combines a score test for the difference in proportions and a Weighted KaplanMeier testbased for the difference of survival functions. The statistics are fully nonparametric and level for testing the null hypothesis of no effect on any of these two outcomes. The statistics in the class are appealing in situations when both outcomes are relevant, regardless of how the followup periods of each outcome are, and even when the hazards are not proportional with respect to the timetoevent outcome or in the presence of delayed treatment effects, albeit the survival curves are supposed not to cross. We have incorporated weighted functions in order to control the relative relevance of each outcome and to specify the type of survival differences that may exist between groups.
The testing procedure using the class of statistics satisfies a property called coherence that says that the nonrejection of an intersection hypothesis implies the nonrejection of any subhypothesis it implies, i.e., and ((Romano and Wolf, 2005)). However, the testing procedure based on the class of statistics does not fulfill the consonant property that states that the rejection of the global null hypothesis implies the rejection of at least one of its subhypothesis. Bittman et al. ((2009)) faced the problem of how to combine tests into a multiple testing procedure for obtaining a procedure that satisfies the coherence and consonance principles. An extension of this work to obtain a testing procedure that satisfies both properties could be an important research line to consider.
This work has been restricted to those cases in which censoring does not prevent to assess the binary endpoint response. We are currently working on a more general censoring scheme where the binary endpoint could be censored. Last but not least, extensions to sequential and adaptive procedures in which the binary outcome could be tested at more than one timepoint remain open for future research.
Acknowledgements
We would like to thank Prof. Yu Shen and Prof. María Durbán for their helpful comments and suggestions. This work is partially supported by grants MTM201564465C21R (MINECO/FEDER) from the Ministerio de Economía y Competitividad (Spain), and 2017 SGR 622 (GRBIO) from the Departament d’Economia i Coneixement de la Generalitat de Catalunya (Spain). M. Bofill Roig acknowledges financial support from the Ministerio de Economía y Competitividad (Spain), through the María de Maeztu Programme for Units of Excellence in R&D (MDM20140445).
Supplementary Materials
Web Appendix A, referenced in Section 3, is available with this paper at the Biometrics website on Wiley Online Library.
References
 Ananthakrishnan and Menon ((2013)) Ananthakrishnan, R., Menon, S. (2013). Design of oncology clinical trials: A review. Critical Reviews in Oncology/Hematology 88(1), 144–153.
 Bauer ((1991)) Bauer P. (1991). Multiple testing in clinical trials. Statistics in Medicine. 10:871–890.
 Bland and Altman ((1995)) Bland JM, Altman DG. (1995). Multiple significance tests: the Bonferroni method. BMJ. Jan 21; 310(6973):170.
 Bittman et al. ((2009)) Bittman, R. M., Romano, J. P., Vallarino, C., Wolf, M. (2009). Optimal testing of multiple hypotheses with common effect direction. Biometrika, 96(2), 399–410.
 De Jong et al. ((2019)) de Jong W, Aerts J, Allard S, et al. (2019). iHIVARNA phase IIa, a randomized, placebocontrolled, doubleblinded trial to evaluate the safety and immunogenicity of iHIVARNA01 in chronically HIVinfected patients under stable combined antiretroviral therapy. Trials. 20(1):361.
 Fleming and Harrington ((1991)) Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis, volume 8. Wiley Online Library.
 Gu et al. ((1999)) Gu, M., Follmann, D., Geller, N. L. (1999). Monitoring a general class of twosample survival statistics with applications. Biometrika, 86(1), 45–57.
 Hodi et al. ((2019)) Hodi FS, O?Day SJ, Mcdermott DF, Al. E. (2010). Improved Survival with Ipilimumab in Patients with Metastatic Melanoma. The New England journal of medicine. 363(8):711–723.
 Hothorn et al. ((2008)) Hothorn, T., Bretz, F., Westfall, P. (2008). Simultaneous inference in general parametric models. Biometrical Journal, 50(3), 346–363.
 Lachin ((1981)) Lachin, J. M. (1981). Introduction to Sample Size determination and Power analysis for Clinical Trials. Controlled Clinical Trials, 2, 92–113.
 Lai and Zee ((2015)) Lai, X., Zee, B. C. Y. (2015). Mixed response and timetoevent endpoints for multistage singlearm phase II design. Trials, 16(1), 1–10.
 Lai et al. ((2012)) Lai, T. L., Lavori, P. W., Shih, M. C. (2012). Sequential design of phase IIIII cancer trials. Statistics in Medicine, 31(18), 1944–1960.
 Logan et al. ((2008)) Logan, B. R., Klein, J. P., Zhang, M. J. (2008). Comparing treatments in the presence of crossing survival curves: An application to bone marrow transplantation. Biometrics, 64(3), 733–740.
 Lehmann and Romano ((2012)) Lehmann, E. L., Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. New York: Springer
 Hess and Gentleman ((2019)) Hess, K., Gentleman, R. (2019). Package ’muhaz’: Hazard Function Estimation in Survival Analysis. Version 1.2.6.1.
 Mick and Chen ((2015)) Mick, R., Chen, T.T. (2015). Statistical Challenges in the Design of LateStage Cancer Immunotherapy Studies. Cancer Immunology Research, 3(12), 1292–1298.
 Muller and Wang ((1994)) Muller, H.G., Wang, J.L. (1994). Hazard Rate Estimation under Random Censoring with Varying Kernels and Bandwidths. Biometrics, 50(1), 61–76.
 Pepe and Fleming ((1989)) Pepe, M. S., Fleming, T. R. (1989). Weighted KaplanMeier Statistics: A Class of Distance Tests for Censored Survival Data. Biometrics, 45(2), 497–507.
 Pepe and Fleming ((1991)) Pepe, M. S., Fleming, T. R. (1991). Weighted KaplanMeier Statistics: Large Sample and Optimality Considerations. Journal of the Royal Statistical Society. Series B (Methodological), 53(2), 341–352.
 Pipper et al. ((2012)) Pipper, C. B., Ritz, C., Bisgaard, H. (2012). A versatile method for confirmatory evaluation of the effects of a covariate in multiple models. Journal of the Royal Statistical Society. Series C: Applied Statistics, 61(2), 315–326.
 Pocock et al. ((1987)) Pocock, S. J., Geller, N. L., Tsiatis, A. A. (1987). The analysis of multiple endpoints in clinical trials. Biometrics, 43(3), 487–498.
 Romano and Wolf ((2005)) Romano, J. P., Wolf, M. (2005). Exact and approximate stepdown methods for multiple hypothesis testing. Journal of the American Statistical Association, 100(469), 94–108.
 Shen and Fleming ((1997)) Shen, Y., Fleming, T. R. (1997). Weighted mean survival test statistics: A class of distance tests for censored survival data. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 59(1), 269–280.
 Shen and Cai ((2001)) Shen Y, Cai J. (2001). Maximum of the weighted KaplanMeier tests with application to cancer prevention and screening trials. Biometrics. 57(3):837–843.
 Sun ((2020)) Sun R. (2020). GitHub Repository: https://github.com/ryanrsun/reconstructkm
 Thall ((2008)) Thall, P. F. (2008). A review of phase 23 clinical trial designs. Lifetime Data Analysis, 14(1), 37–53.
 Trivedi and Zimmer ((2007)) Trivedi, P. K. and Zimmer, D. M. (2007). Copula modeling: an introduction for practitioners. Foundations and Trends in Econometrics, 1(1), 1–111.
 Wilson et al. ((2015)) Wilson, M. K., Collyar, D., Chingos, D. T., Friedlander, M., Ho, T. W., Karakasis, K., Oza, A. M. (2015). Outcomes and endpoints in cancer trials: Bridging the divide. The Lancet Oncology, 16(1), e43?e52.
Appendix A Proof of theorems
A sequence of random vectors that converges in probability to as will be denoted by . The convergence in distribution will be written as .
Proof of Lemma 3.1.
The proof is a direct consequence of the asymptotic representation of the timetoevent statistic which can be written as , where:
and
Comments
There are no comments yet.