1 Introduction
With rapid advances in the understanding of human diseases, the paradigm of medicine shifts from “onefitsall” to “targeted therapies” or “precision medicine”. One aspect of precision medicine research is to develop new therapies that target a subgroup of patients. The subgroups are usually defined by “markers”, where the markers could be genotype information (such as mutation of a certain gene), expression level of certain protein(s), disease severity or any biologically plausible factors. For example, the status of patients with metastatic colorectal cancer is associated with the progression free survival when treated by panitumumab monotherapy (Amado et al., 2008), and the vemurafenib (Zelboraf) was approved for treating the mutated metastatic melanoma (Bollag et al., 2012). For another example, multiple genetic variations including have been reported to be associated with the drug response among asthma patients using inhaled corticosteroids (GarcíaMenaya et al., 2019).
The drug development process typically involves comparing a new treatment () with a control (, such as a standardofcare) through randomized clinical trials (RCTs), and treatment efficacy is the “relative” effect between and . Finding the subgroups of patients that exhibit enhanced treatment efficacy is a problem at the heart of modern drug development. To confidently identify subgroups, it is often necessary to infer treatment efficacy in each group and some combination of groups. For example, for a single nucleotide polymorphism (SNP) that separates patients into three groups (denoted by , and ), one may have to decide whether the new treatment should target a single group (e.g., ) or a combination of groups (e.g., ). In this case, the treatment efficacy in both single genetic groups and their combinations need to be assessed. As shown by Lin et al. (2019), when population is a mixture of subgroups with heterogeneous efficacy, the efficacy measure needs to respect the logic relationship among subgroups and their combinations. Take the SNP case as an example, if the treatment efficacy for is , and for is , then the efficacy for should be within (assuming
). This may sound trivial. However, several commonly used efficacy measures such as the hazard ratio (for timetoevent data) and the odds ratio (for binary data) do not satisfy this relationship
(Ding et al., 2016; Lin et al., 2019). Note that this logicrespecting is a property of an efficacy measure, not a property of the type of data that quantifies the outcome, nor a statistical model for analyzing such data.Our research is highly motivated from analyzing the AgeRelated Eye Disease Study (AREDS) data, a RCT to study the risk factors for the agerelated macular degeneration (AMD) and assess the effect of micronutrients on delaying the progression to late AMD (AgeRelated Eye Disease Study Research Group, 1999). AMD is a polygenic and progressive neurodegenerative disease, which is a leading cause of blindness in elderly. Patients can progress to one or both forms of lateAMD – central geographic atrophy (GA) and choroidal neovascularization (CNV). The AREDS study collected DNA samples of consenting participants and performed genomewide genotyping (Fritsche et al., 2016). Many genetic studies have shown that the development or the progression of AMD is associated with various genetic risk factors (Seddon et al., 2007; Fritsche et al., 2016; Yan et al., 2018; Sun and Ding, 2019). Specifically, in two recent genomewide association studies for AMD progression using the AREDS data (Yan et al., 2018; Sun and Ding, 2019), where timetolateAMD is the outcome, multiple variants from ARMS2HTRA1 and CFH gene regions have been discovered to be associated with AMD progression. Besides association analyses where no treatment is involved, multiple research groups also investigated whether variants from these two gene regions are associated with differential treatment responses. A recent review article by Cascella et al. (2018) summarized the controversial findings. Research groups such as Klein et al. (2008) and Seddon et al. (2016) reported that genetic variants from CFH and ARMS2 regions were found to be associated with differential responses to the antioxidants plus zinc treatment. However, the AREDS investigators reported no significant associations between CFH and ARMS2 regions and the nutritional supplements, when multiplicity adjustment has been taken into account . To fully understand the effects of those nutritional supplements on AMD progression and to infer whether there are genetic subgroups with enhanced treatment efficacy, a rigorous statistical procedure that can simultaneously identify and infer subgroups for timetoevent outcome is required.
There are existing methods for detecting heterogeneous treatment effects across groups for timetoevent outcome. One simple but broadly used method is to test the treatmentbymarker interaction in the Cox Proportional Hazards (CoxPH) model (Cox, 1972). However, this method cannot provide which group to target directly, nor can it provide inference on subgroupspecific efficacy. The second type of approach focuses on testing the existence of a subgroup (with an enhanced treatment effect) using either a logisticCox model for the response in each subgroup and the latent subgroup membership (Wu et al., 2016) or a new CoxPH model including a nonparametric component for the covariate in the control group and a subgrouptreatmentinteraction effect defined by a change plane (Kang et al., 2017). Similar to the interaction test, additional steps are needed to provide inference in the targeted and nontargeted group. The third common approach utilizes treebased regression models, including RECursive Partition (Ciampi et al., 1995; Negassa et al., 2005), interaction trees (Su et al., 2008), SIDES (Lipkovich et al., 2011), GUIDE (Loh et al., 2015) and etc. One appealing feature of using these methods is that it can provide the tree structure which incorporates multiple covariates at the same time. However, none of these methods provides inference for treatment efficacy in both targeted group and nontargeted group, and some approaches use illogical efficacy measures. The targeted treatment development process involves the codevelopment of a drug compound and a companion diagnostic tool that identifies the suitable subgroup of patients for the drug to target. Therefore, the subgroup has to be “simple” (usually defined by one or two biomarkers) for clinical and regulatory feasibility. In this article, we develop a multipletestingbased approach which aims to simultaneous identify and infer “simple” subgroups with enhanced treatment efficacy.
The article is organized as follows. Section 2 introduces the logicrespecting efficacy measure for timetoevent outcomes that we choose to use and its associated properties, and with that efficacy measure how we formulate the contrasts to identify subgroups and adjust for the multiplicity. Section 3 presents simulations to show finite sample performance of the proposed method and uses realistic simulations to summarize practical rules for the use of the method. Then we apply our method on the Agerelated Eye Disease Study (AREDS) data and present our findings in Section 4. Finally, we discuss and conclude in Section 5.
2 Methods
In this article, we deal with ordinal categorical markers which separate the population into a few groups (e.g., three groups by a SNP, or four groups by the immunohistochemistry test). Below we use the scenario of “threecategory” marker to illustrate our method. Brief discussions are provided in Section 5 regarding how to generalize the method to handle markers with more categories or continuous markers.
2.1 Issues with hazard ratio
First, we demonstrate why the commonly used HR is not a suitable efficacy measure when the population is a mixture of subgroups. Let denote the hazard ratio for each subgroup defined by , and let denote the HR for the combined. HR is not logicrespecting in the sense that even if both and are constant, the combined population typically does not have a constant HR. In fact, the HR of the mixture population is usually a complex function of time, with values at some time points outside of . This is because can not be expressed as a weighted combination of and . The combination can only be made on density or cumulative density functions, not on the hazard ratio scale. For example, we can generate data from a Weibull distribution where , with an equal prevalence of the two subgroups. However, the true is a smooth function of time and goes above 0.82 when is large. In this case, it is possible that another subpopulation has and then at some large time (e.g., ), , meaning exhibits more efficacy than combined, but does not exhibit more efficacy than either or . Thus using HR as the efficacy measure can lead to paradoxical findings in patient targeting.
2.2 Ratio of quantile survival times and its property
Realizing that HR is not suitable to use in the presence of mixture population, a different measure needs to be considered. Ding et al. (2016) demonstrated that the ratio or difference of mean or median survival times (between and ) is logicrespecting by guaranteeing , where denotes the efficacy measure. In this manuscript, we choose ratio of quantile survival times as our efficacy measure and demonstrate that it has a unique property under the CoxPH model that we consider. Assume the timetoevent data fit the following model:
(1) 
where or is the treatment assignment, , , or is the marker we consider for testing, and is the hazard function for the group receiving , with additional covariates set to 0. Further we assume that the baseline survival function is from a Weibull distribution with scale and shape , which corresponds to . Denote by the corresponding quantile () survival time in each markerbytreatment group. Let and with s from (2.2), then by setting the survival function for each group equal to , the corresponding survival and their ratios can be directly calculated as follows:
It can be seen that the quantile survival time for each group depends on the baseline characteristics (, but the ratio does not. We name it as the covariate invariant property. This property is unique to this efficacy measure, which is attractive as it makes the comparison (between and ) simple. Further it can be shown that this property also holds in the combined groups. For example, suppose we are interested in the ratio of quantile survival times in the mixture population of (denoted as ). We can calculate from its definition, , where and can be obtained by solving the following equations,
(2) 
with representing the prevalence of in the combined population
. By combining the two groups at the probability level, this calculation follows the subgroup mixable estimation (SME) principle
(Ding et al., 2016). Let and , then we have . Since the solutions for and from equation (2.2) are free of , we also have the covariateinvariant property for .2.3 Confident Effect 4 contrasts (CE4) for ratio of quantile survival times
In targeted therapy development, researchers are interested in (1) whether there exists a subgroup with enhanced treatment efficacy and (2) the treatment efficacy in both targeted and nontargeted subgroups (for appropriate drug labeling) (Lin et al., 2019). To answer both questions simultaneously, we propose to use contrasts to compare efficacy between different subgroups and combination of subgroups. Under the same scenario with three groups defined by a marker (, , or ), in order to get a complete ordering of the treatment efficacy in all possible groups, we propose the following four contrasts:
(3) 
We drop in the notation as is prespecified. Moreover, these contrasts are built on the log scale of the efficacy measure since previous experience demonstrates the normality approximation seems to work better on the log scale (as compared to the original scale) (Ding et al., 2016). In fact these four contrasts are analogous to the contrasts proposed in Ding et al. (2018) where the efficacy in their case is measured by a continuous outcome. As pointed by Ding et al. (2018), they are equivalent to testing the following eight onesided null hypotheses where each one is to test an inequality against its complement (i.e., vs ) rather than testing a zero null (such as ).
From these eight onesided tests, we are able to tell which subgroup or combination of subgroups exhibits a differential efficacy than its complementary group.
We propose to use simultaneous confidence intervals on these four contrasts so that we can identify differential subgroup(s) and infer their efficacy simultaneously. Note that level simultaneous confidence intervals for those contrasts effectively form a level
interaction test: reject the null hypothesis of no interaction between Treatment effect (
) and marker group () if at least one of the confidence intervals does not contain zero. Moreover, this formulation of assessing “interaction” effect is advantageous toward patient targeting as it allows decisionmaking based on clinically meaningful differences (reflected from confidence intervals on efficacy comparisons) instead of a mere statistical significance (such as the value from the typical interaction test). To estimate the four contrasts from (2.3) under our model (2.2), we propose the following three steps and name this approach as “CE4Weibull”:
Step 1. Estimate all the parameters in the Weibull model (e.g., ).

Step 2. Estimate the quantile survival times , , , , , or
, based on the parameter estimates obtained in Step 1 and their associated variance covariance matrix.

Step 3. Estimate and and their variance covariance using estimators obtained from Step 2 and then calculate the four contrasts CE4.
The estimated variance covariance matrices in Step 2 and Step 3 can be obtained using the Delta method. Note that in Step 2, the Delta method for implicitly defined random variables
(Benichou and Gail, 1989) needs to be applied since the quantile survival times in the combined groups (e.g., and ) are not explicitly defined, but rather from solving equations like (2.2).The estimated CE4 will asymptotically follow a multivariate normal distribution and the simultaneous confidence intervals can be then derived as follows. We compute the quantile
such that the four simultaneous confidence intervalshave a coverage probability , that is, the joint probability
where is the variance estimator for . The R package {mvtnorm} can be used to obtain this value. Then the value (from testing the eight onesided null hypotheses simultaneously) can be obtained from the multivariate normal distribution as well, which corresponds to the smallest values from the four single contrasts. If any of the four contrasts does not cover 0, it suggests that there exists subgroup(s) with differential treatment efficacy.
2.4 Multiplicity adjustment across biomarkers
In targeted treatment development, typically a large collection of markers need to be tested in order to identify subgroups. Therefore, there are two families of inferences need to be considered: within a marker and across markers. Specifically, strong control of familywise error rate (FWER) for inference within a marker is desired, since the consequence of an incorrect inference may target a wrong subgroup, which is serious. The simultaneous confidence intervals obtained from our CE4Weibull method appropriately controls the withinmarker FWER. While the error rate for inference across multiple markers can be controlled less stringently, since multiple candidate markers can be identified for tailoring (which may indicate largely overlapped subgroups to target), and therefore the error rate seems acceptable.
Suppose there are a total of markers to be tested. Denote by the number of confidence intervals that fail to cover the true values for the marker. Then the FWER for the SNP is . For inference across SNPs, denote by the number of markers that have at least one of its confidence intervals failing to cover its true value. Then the error rate is . We suggest to use the simple additive adjustment as proposed by Ding et al. (2018), of which by setting the desired error rate as (a prespecified positive integer), the familywise for each marker is then set to be (same across all markers).
When SNPs are the biomarkers to define subgroups, the screening process seems similar to a genomewide association study (GWAS). However, our proposal controls for error rate instead of the commonly used false discovery rate (FDR). In GWAS, it is plausible biologically that the vast majority of the SNPs are not associated with the specific disease. However, when treatments are involved, the biological processes become more complex, and zeronulls of nodifference (e.g., phrased as ) are statistically false, which was first observed in the setting of Ding et al. (2018), where the treatment efficacy was simulated based on a single causal SNP with no random error being added. It was found that practically all other SNPs would appear “associated” with the outcome (as sample size reaches infinity) when analyzed in a SNPbySNP fashion. The reason is that most SNPs are not “orthogonal” to each other, and thus any SNP will appear somewhat associated with treatment outcome as long as the distribution for proportions of being in this SNP and the causal SNP are not independent, which is most of the cases. When there are no zeronulls statistically, the “false” discovery seems lame, and the rate is preferred by providing more meaningful candidates. We have more discussions about the zeronull issue in Section 3.
3 Realistic Simulations
3.1 Single SNP simulations
We use SNPs as the biomarkers in all simulation studies. First, simulation studies based on a single SNP were conducted to investigate the finite sample performance of the proposed CE4Weibull method. We considered three scenarios: (1) No SNP effect, i.e., Rx is not efficacious for any genotype group; (2) The allele a has a dominant beneficial effect on Rx; (3) The allele a has a recessive beneficial effect on Rx. The SNP was simulated from a multinomial distribution with (corresponding to minor allele frequency (MAF) of 0.4). Survival times were simulated from model (2.2) with scale and shape . The parameters were set to be , and
for the three scenarios respectively. The censoring times were generated from an independent uniform distribution
with a and b chosen to yield 20% and 50% censoring rates. We chose the quantile as which corresponds to the median survival. The true values of the CE4 contrasts using the ratio of median survival as the efficacy measure for each scenario are: (1) , (2) , and (3) .We ran 1000 simulations for each scenario, with sample size 500 for each treatment arm and the results are summarized in Figure 1. Across all the scenarios, the biases of the CE4 estimates are minimal and the coverage probabilities for the simultaneous confidence intervals (SCP) are all close to 95. Larger variations are observed in biases of and , especially under scenario 3. This is because under the recessive effect setting, is only efficacious in patients, which is a small proportion of the total population (). Therefore, the contrasts involving the comparison between and other group (or the combination of other groups) have larger variances.
3.2 Realistic simulations
To understand the performance of the proposed CE4Weibull method in the real genetic setting with a large number of SNPs, we used the chromosomewide data from AREDS. Among those () participants who had DNA collected and genotyped, we randomly selected 1000 Caucasian participants and randomly “assigned” them in a 1:1 ratio to the new treatment Rx and a standard care C. Since AMD is known as a polygenic disorder and many studies have discovered or confirmed genetic risk variants associated with AMD, in this realistic simulation, we selected a variant from the wellknown AMD risk gene region on Chromosome 10 as the causal SNP, and assumed the minor allele of this variant has a dominant beneficial effect on . We kept the three genotype groups (defined by the causal SNP) balanced between and . Similar as in the single SNP simulations, the progression times were simulated from the Weibull model with and . The parameters for were set to be , corresponding to . The censoring rate was set to be 25%. Therefore, the SNP data were real (from real subjects) and the progression times and SNP effects were simulated.
We analyzed chromosome 10 using our CE4Weibull model and filtered the SNPs with less than three patients in each genotype group within each treatment arm, which resulted in a total of 268,053 SNPs. We set , allowing on average 10 out of 270,000 SNPs with at least one confidence interval failing to cover its true value, which is equivalent to setting the level at (). A total of 37 SNPs were identified by CE4Weibull. Among those 37 SNPs, 30 of them are from the ARMS2HTRA1 region, including the causal SNP. Other seven SNPs belong to six different gene regions, which are distance away from the causal gene region. Figure 2A plotted the positions of these SNPs relative to the causal SNP, with axis (.CE4)) showing the significance level of each SNP. Figure 2B plotted MaxEff vs .CE4), where MaxEff (maximal effect) is defined as the maximum absolute value among the estimated CE4 contrasts that do not cover zero. The causal SNP has the smallest value () and with MaxEff of 2.20. Note that some top SNPs have very large MaxEff values. For example, SNP from the C10orf128C10orf71AS1 region has the largest MaxEff of 29.7, while its pvalue is relatively large (, close to the threshold). We caution against the situation when a huge effect size is seen, since such a huge effect for treatment efficacy is clinically unlikely. For this specific SNP, it is not surprising to see the corresponding confidence interval for is very wide and the effective patient population only consists of the total population.
To further investigate the relationship between the top SNPs and the causal SNP, we proposed a novel SNP crosstalk plot. It is based on a ternary diagram using barycentric coordinates to display the proportion of three variables that sum to one. Specifically, we projected the percentages of the , , and categories of the causal SNP in each of these categories of a given top SNP onto the triangular diagram, and connect the points with lines. If the SNP is highly correlated with the causal SNP in terms of the distribution of AA, Aa, and aa, the percentages will be close to (1,0,0), (0,1,0) and (0,0,1), and thus the connected line segments will be long and lie closely to the two edges of the triangle. Otherwise, the three dots will be close to each other to give a short angle. For example, in Figure 2C, the causal SNP has a perfect match in terms of the percentages with itself so the three points are the vertexes of the triangle, which makes the connected line segment coincide with the edges and (denoted by the dashed lines). From the plot, all 30 SNPs from ARMS2HTRA1 region are highly correlated with the causal SNP, indicated by the long red line segments, which explains why they have been identified by CE4Weibull. For the 7 SNPs from other regions, their line segments are all short, indicating they might have been identified due to randomness. Then we repeated this chromosomewide realistic simulation for 100 times.
3.3 Results from 100 realistic simulations
In these 100 repeated runs, the SNP data are all the same but the progression times and censoring times are different due to randomness from the model. By setting , on average there are 61 SNPs identified per run with a total of 3292 SNPs being picked at least once. The causal SNP were picked 90 out of 100 times and the distribution of the ranks is shown in the stemandleaf plot (upper panel in Figure 3). Note that 84 out of 90 times the rank of the causal SNP was among top 30 and 52 times it was among top 10, indicating that our CE4Weibull is robust in identifying the true causal SNP. The lower panel in Figure 3 summarizes all the identified SNPs from all 100 runs in terms of their relative position to the causal SNP and their frequencies of being picked up. We found that 98.9% of the 3292 SNPs were only picked less than 5 times, which are highly likely due to randomness. While for SNPs close to the causal SNP and located in the same ARMS2HTRA1 gene region, the probability of being selected is much higher, among which 27 SNPs were identified for more than 80% of the times. From this repeated chromosomewide simulations, we confirmed that there are possibilities that some SNPs are picked by random error but the true causal SNP and its surrounding SNPs can be identified with very high probabilities by CE4Weibull. Moreover, due to the existence of linkage disequilibrium among SNPs, it is very unlikely that an isolated SNP will be the true causal SNP.
Based on the observations from our realistic simulations, we recommend the following rules to guide the selection of “candidate” SNPs from those identified by CE4Weibull. (1) There are multiple SNPs (e.g.,) being picked from the same gene region; (2) The MaxEff should not be unrealistically large; and (3) The targeted group should be a reasonable proportion (not too small or large, e.g., ) of the total population.
4 Application to AREDS Data
4.1 Background
AREDS is a large multicenter RCT sponsored by the National Eye Institute to evaluate the effect of antioxidants and/or zinc on delaying the progression of AMD (AgeRelated Eye Disease Study Research Group, 1999). The original study includes four treatment arms: placebo, antioxidants, zinc and the combination of antioxidants and zinc, where the last treatment then becomes the “AREDS formula” dietary supplements which are now available in various drug stores. However, the treatment effects of the nonplacebo arms on delaying the late AMD progression are not statistically significant (Ding et al., 2017). Among the four arms, we specifically investigated participants in the placebo arm () and the combination of antioxidants and zinc treatment arm (), which included 1,170 Caucasian participants with both eyes free of advanced AMD progression when entering the study. The outcome is the timetolateAMD from the first progressed eye, where lateAMD is defined as the severity score reaches 9 or above (9=GA, 10=central GA, 11=CNV, 12=central GA and CNV). As shown in Table 1, age, sex and smoking status do not differ between the two treatment arms. However, the baseline AMD severity score is significantly higher among patients who were randomized to the group, as compared to patients who were randomized to placebo ( vs ). This is as expected and is due to the randomization design: patients free of AMD at baseline can only be randomized to the placebo or antioxidants arms, but not the combination arm, and thus the baseline severity score needs to be adjusted in the analysis. The overall censoring rate is about 75%, so we used as our quantile.
4.2 CE4Weibull on AREDS
We first evaluated the treatment effect of “AREDS formula” on timetoprogression in the overall population using a Weibull regression model, adjusting for the known risk factors including age, smoking status, and baseline severity score. The estimated HR between and is 1.15 with , and the ratio of 75 quantile progressionfree time for and is 0.91 with . It suggests that the combination of antioxidants and zinc does not seem to be effective in slowing down the disease progression in the overall population, which is consistent with previous findings (Ding et al., 2017). Then we applied CE4Weibull method to analyze all common variants (i.e., MAF 0.05) across 22 autosomal chromosomes, resulting in a total of 3,837,556 SNPs. Similarly, baseline age, smoking status and severity score were adjusted. The upper panel of Figure 4 presents the Manhattan plot of this genomewide CE4Weibull analysis. By setting , a total of 46 SNPs meet the significance threshold of . These SNPs are from nine gene regions on seven chromosomes. Following the recommendation rule we proposed in Section 3.3, there are three gene regions each with at least four SNPs meeting the value threshold and they are labeled in the Manhattan plot: CHST3SPOCK2 on CHR 10 (4 SNPs), ESRRBVASH1 on CHR 14 (30 SNPs), and C19orf44CALR3 on CHR 19 (6 SNPs). We examined the correlation between all 46 identified SNPs using the crosstalk plot and presented the result in the lower panel of Figure 4. We picked (from ESRRBVASH1 region on CHR14), which has the smallest pvalue () as the reference SNP. It can be seen that the other 29 SNPs from the same ESRRBVASH1 region are highly correlated with the top SNP , indicated by the long edges of the red segments. The other two gene regions, CHST3SPOCK2 and C19orf44CALR3 are not highly correlated with the ESRRBVASH1 region, although the multiple SNPs within each region are highly dependent on each other (denoted by overlapped segments in green or blue color). In this case, there may be more than one causal SNP that leads to the differential treatment effects.
Number of subjects 



pvalue*  

Age  0.309  
Mean (SD)  68.4 (4.9)  68.3 (4.8)  68.6 (4.9)  
Median (Range)  68.2 (55.381.0)  68.0 (55.381.0)  68.7 (55.579.5)  
Sex (n, %)  0.289  
Female  655 (56.0)  413 (54.8)  242 (58.2)  
Male  515 (44.0)  341 (45.2)  174 (41.8)  
Smoking (n, %)  0.758  
Never Smoked  571 (48.8)  371 (49.2)  200 (48.1)  
Former/Current Smoker  599 (51.2)  383 (50.8)  216 (51.9)  
Baseline AMD severity score  0.001  
Mean (SD)  3.2 (2.2)  2.6 (2.2)  4.0 (2.1)  
Median (Range)  2.0 (1.08.0)  1.0 (1.08.0)  4.0 (1.08.0)  
Status (n, %)  0.001  
Progressed  269 (23.0)  133 (17.6)  136 (32.7)  
*p value is based on twosample t test or Pearson Chisquare test for continuous or categorical variables 
We picked the top SNP ( on CHR14) as our candidate marker for further discussion. Figure 5 demonstrates the treatment effect profiles and simultaneous confidence intervals for , where the efficacy profile may suggest a dominant beneficial effect of a. The CE4 simultaneous confidence intervals confirm that the targeted group is {Aa,aa} combined since the confidence intervals of and are above the zero line. This targeted group consists about 52% of the total patients, a reasonably high proportion of the entire population. Moreover, the estimated ratio of 75 quantile progressionfree times in the targeted and nontargeted groups (between and ) can be obtained, which are 1.44 and 0.57 for {Aa,aa} and {AA}, respectively, indicating that the combination of antioxidants and zinc treatment extends the progression time for 44% compared to the placebo in the targeted group. The corresponding simultaneous 95% confidence intervals for treatment efficacy (on the log scale) for the targeted and nontargeted population were also constructed and plotted in the right panel of Figure 5. Finally, the characteristics of targeted and nontargeted population based on are summarized in Table 2. As for the baseline characteristics and treatment assignment, the patients in the targeted group do not differ from the patients in the nontargeted group, indicating that the enhanced benefit from the treatment in the targeted population is plausibly due to the genetic difference rather than the demographic or clinical differences.
: chr14, ESRRBVASH1 region  


pvalue  

605 (51.7)  565 (48.3)  
Treatment efficacy (SE)  1.44 (1.01)  0.57 (1.01)  ^{⋆}  
Age  0.560  


68.4 (4.9)  


68.2 (55.880.5)  

0.982  
Female 

317 (56.1)  
Male 

248 (43.9)  

0.169  


288 (51.0)  


277 (49.0)  

0.510  


370 (65.5)  


195 (34.5)  

0.487  


3.2 (2.2)  


3.0 (1.08.0)  
: denotes the estimated quantile progression time  
: p value is from the corresponding CE4 contrast when simultaneous type I error is controlled, without adjusting for crossSNP multiplicity 
To help elucidate the controversial findings regarding whether genetic polymorphisms of CFH and ARMS2 alter the treatment efficacy of AREDS formula, we closely checked 6 SNPs from these two regions and their results are presented in Table 3. Note that , , and from CFH and from ARMS2HTRA1 have been previously investigated in Seddon et al. (2016); Vavvas et al. (2018); Assel et al. (2018). We also reported the SNPs with the smallest CE4based value from each region, which are and . None of the 6 SNPs meets the significance threshold of , although 3 SNPs from CFH region meet the nominal level of . We further investigated from and it seems our CE4Weibull result suggests the combination group exhibiting better treatment efficacy compared to its complementary group , which is similar to the findings from Seddon et al. (2016) and Assel et al. (2018). However, it is worthwhile to note that from our genomewide CE4 analysis, none of these SNPs ranked top (Table 3). Therefore, with appropriate multiplicity adjustment, neither CFH or ARMS2HTRA1 region has SNPs showing significant association with treatment efficacy, which is consistent with the conclusion indicated by Chew et al. (2015).
Gene  SNP  .CE4  rank.CE4 

CFH  3843  
9021  
71651  
0.38  1614412  
ARMS2HTRA1  378198  
0.77  3059694 
It should be noted that based on different SNPs, the suggested targeted population may vary. In this example, if a top SNP from CHST3SPOCK2 on CHR 10 is considered as the biomarker (e.g., ), the targeted population is about 65.8% of the total population, which overlaps with the targeted population indicated by by 67.1%. Our CE4Weibull method provides reliable and interpretable candidate targeted populations for consideration, while the final decision on which population to target involves many other considerations such as development of companion diagnostics, labeling, marketing, and reimbursement.
4.3 Validation on AREDS2
AREDS2 was another independent large multicenter RCT of AMD (Chew et al., 2012). It was designed to evaluate the effect of refined AREDS formulations on AMD progression, as compared to the original AREDS formulation. Participants of AREDS2 were more severe at baseline and the followup time was only about half of the AREDS’s followup time. Four arms were included: AREDS formulation, AREDS formulation plus Lutein/Zeaxanthin, AREDS formulation plus DHA/EPA, and AREDS formulation plus Lutein/Zeaxanthin and DHA/EPA. Since there is no real placebo arm in AREDS2, we cannot apply CE4Weibull to identify subgroups with enhanced efficacy of AREDS formulation directly in AREDS2. Instead, we specifically investigated the patient’s response to the same antioxidants and zinc combination treatment to check whether we observe similar differential response patterns between the AREDS identified targeted and nontargeted groups in AREDS2 as compared to AREDS. We used the same SNP to determine whether a patient belongs to targeted {Aa,aa} group or nontargeted {AA} group in both studies. Table 4 presents the patient characteristics within the targeted and nontargeted groups, separately for AREDS and AREDS2. None of the baseline risk factors differs between targeted and nontargeted populations in each study. Between the two studies, AREDS2 patients are older and more severe and thus are anticipated to progress faster. Figure 6 compares the progressionfree KaplanMeier curves between the targeted and nontargeted groups within each study. In both studies, the targeted population shows an obvious better progressionfree profile than the nontargeted population, with the logrank test value of 0.00011 and 0.013, respectively. Therefore, we successfully validated our identified targeted group by SNP in AREDS2. We also checked the subgroups indicated by reported SNPs from the CFH region, for example . In AREDS, the two groups ( vs ) exhibit differential treatment response profiles (logrank ). However, such differential response profiles were not observed in AREDS2. This further emphasizes appropriate multiplicity adjustment is crucial for robust subgroup identification.
4.4 Data Availability
The phenotype and genotype data of AREDS and AREDS2 are available from the online repository dbGap (accession: , and , respectively). The proposed method and its applications are implemented in R. The key functions can be obtained from GitHub upon the acceptance of this manuscript.
5 Conclusion and Discussion
In this work, we develop a new statistical method to confidently identify and infer subgroups in modern RCTs with timetoevent outcomes. Different from machine learning based approaches, our CE4Weibull approach, derived from the fundamental multiple testing principle, provides simultaneous confidence intervals on contrasts that directly compare the treatment efficacy between subgroups or combination of subgroups. The contrasts are built upon a logicrespecting efficacy measure, the ratio of quantile survival times (between
and ), which enjoys the unique covariate invariant property in addition to its interpretationfriendly feature.Our CE4Weibull adjusts for multiplicity both within and across the markers. It rigorously combines two error rate controls, familywise error rate control within each marker, and per family error rate control across the markers. Such error control is appropriate in a drug development process, as it allows flexibility in the exploration of multiple candidate markers, while being confident in the patient subgroup to target from any selected marker(s). Such a novel and rigorous multiplicity adjustment contributes to the reduction in the socalled “reproducibility crisis” in which many discoveries in markers or effective subgroups turn out to be false positive findings.
From our realistic simulation studies where the SNP data are taken from true individuals, we recommend practically useful rules for identifying “candidate” markers. Finally, we successfully applied our method on AREDS data to identify subgroups that exhibit enhanced treatment efficacy with combination of antioxidant supplements and zinc in delaying AMD progression. After conducting the genomewide analysis using our CE4Weibull method, three gene regions were discovered to suggest subgroups with significantly enhanced efficacy: CHST3SPOCK2 on CHR 10, ESRRBVASH1 on CHR 14, and C19orf44CALR3 on CHR 19. We further validated the subgroups defined by the top SNP from ESRRBVASH1 region using the data from an independent AREDS2 study. SanGiovanni et al. (2013) found that the estrogen related receptor beta (ESRRB) was (weakly) associated with CNV AMD. Wakusawa et al. (2008) first demonstrated the angiogenesis modulation of VASH is involved in the pathological process of AMD. Later Zeng et al. (2012) inferred that the treatment with AREDS formulation is likely to affect both angiogenesis and endothelialmacrophage interactions. Thus our findings provide new perspectives on the differential treatment efficacy, suggested by genetic polymorphisms for delaying AMD progression.
Although we use the SNP testing scenario to demonstrate our method, the key elements of the method apply to broader scenarios with all kinds of markers to consider. When the marker separates the patient population into more groups (), additional contrasts need to be considered to obtain the complete ordering of the treatment efficacy, which will then be used to identify the subgroups. The current version of our method only handles discrete markers and more work is required to generalize it for continuous markers. In doing that, one may borrow the idea from Liu et al. (2016) which considers all candidate thresholds for a continuous marker when deriving simultaneous confidence intervals.
References
 AgeRelated Eye Disease Study Research Group (1999) AgeRelated Eye Disease Study Research Group (1999). The agerelated eye disease study (AREDS): design implications. AREDS report no. 1. Control Clinical Trials, 20(6):573–600.
 Amado et al. (2008) Amado, R. G., Wolf, M., Peeters, M., Van Cutsem, E., Siena, S., Freeman, D. J., Juan, T., Sikorski, R., Suggs, S., Radinsky, R., Patterson, S. D., and Chang, D. D. (2008). Wildtype KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer. Journal of Clinical Oncology, 26(10):1626–1634.
 Assel et al. (2018) Assel, M. J., Li, F., Wang, Y., Allen, A. S., Baggerly, K. A., and Vickers, A. J. (2018). Genetic polymorphisms of CFH and ARMS2 do not predict response to antioxidants and zinc in patients with agerelated macular degeneration. Ophthalmology, 125(3):391–397.
 Benichou and Gail (1989) Benichou, J. and Gail, M. H. (1989). A delta method for implicitly defined random variables. The American Statistician, 43(1):41–44.
 Bollag et al. (2012) Bollag, G., Tsai, J., Zhang, J., Zhang, C., Ibrahim, P., Nolop, K., and Hirth, P. (2012). Vemurafenib: the first drug approved for BRAFmutant cancer. Nature Review Drug Discovery, 11:873–886.
 Cascella et al. (2018) Cascella, R., Strafella, C., Caputo, V., Errichiello, V., Zampatti, S., Milano, F., Potenza, S., Mauriello, S., Novelli, G., Ricci, F., Cusumano, A., and Giardina, E. (2018). Towards the application of precision medicine in agerelated macular degeneration. Progress in Retinal and Eye Research, 63:132–146.
 Chew et al. (2012) Chew, E. Y., Clemons, T., SanGiovanni, J. P., Danis, R., Domalpally, A., McBee, W., Sperduto, R., and Ferris, F. L. (2012). The agerelated eye disease study 2 (AREDS2): Study design and baseline characteristics (AREDS2 report number 1). Ophthalmology, 119(11):2282 – 2289.
 Chew et al. (2015) Chew, E. Y., Klein, M. L., Clemons, T. E., Agrón, E., and Abecasis, G. R. (2015). Genetic testing in persons with agerelated macular degeneration and the use of the AREDS supplements: to test or not to test? Ophthalmology, 122(1):212–215.
 Ciampi et al. (1995) Ciampi, A., Negassa, A., and Lou, Z. (1995). Treestructured prediction for censored survival data and the cox model. Journal of Clinical Epidemiology, 48:675–689.
 Cox (1972) Cox, D. R. (1972). Regression models and lifetables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2):187–220.
 Ding et al. (2018) Ding, Y., Li, Y. G., Liu, Y., Ruberg, S. J., and Hsu, J. C. (2018). Confident inference for snp effects on treatment efficacy. Ann. Appl. Stat., 12(3):1727–1748.
 Ding et al. (2016) Ding, Y., Lin, H.M., and Hsu, J. C. (2016). Subgroup mixable inference on treatment efficacy in mixture populations, with an application to time‐to‐event outcomes. Stat Med., 35(10):1580–1594.
 Ding et al. (2017) Ding, Y., Liu, Y., Yan, Q., Fritsche, L. G., Cook, R. J., Clemons, T., Ratnapriya, R., Klein, M. L., Abecasis, G. R., Swaroop, A., Chew, E. Y., Weeks, D. E., and Chen, W. (2017). Bivariate analysis of agerelated macular degeneration progression using genetic risk scores. Genetics, 206:119–133.
 Fritsche et al. (2016) Fritsche, L. G., Igl, W., Baileyet, J. N. C., et al. (2016). A large genomewide association study of agerelated macular degeneration highlights contributions of rare and common variants. Nature Genetics, 48:134–143.
 GarcíaMenaya et al. (2019) GarcíaMenaya, J. M., CordobésDurán, C., GarcíaMartín, E., and Agúndez, J. A. G. (2019). Pharmacogenetic factors affecting asthma treatment response. potential implications for drug therapy. Frontiers in Pharmacology, 10:520.
 Kang et al. (2017) Kang, S., Wenbin, L., and Rui, S. (2017). A regression tree approach to identifying subgroups with differential treatment effects. Statistics in Medicine, 36:4646–4659.
 Klein et al. (2008) Klein, M. L., Francis, P. J., Rosner, B., Reynolds, R., Hamon, S. C., Schultz, D. W., Ott, J., and Seddon, J. M. (2008). CFH and LOC387715/ARMS2 genotypes and treatment with antioxidants and zinc for agerelated macular degeneration. Ophthalmology, 115(6):1019–1025.
 Lin et al. (2019) Lin, H.M., Xu, H., Ding, Y., and Hsu, J. C. (2019). Correct and logical inference on efficacy in subgroups and their mixture for binary outcomes. Biometrical Journal, 61(1):8–26.
 Lipkovich et al. (2011) Lipkovich, I., Dmitrienko, A., Denne, J., and Enas, G. (2011). Subgroup identification based on differential effect search—a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine, 30:2601–2621.
 Liu et al. (2016) Liu, Y., Tang, S.Y., Man, M., Li, Y. G., Ruberg, S. J., Kaizar, E., and Hsu, J. C. (2016). Thresholding of a continuous companion diagnostic test confident of efficacy in targeted population. Statistics in Biopharmaceutical Research, 8(3):325–333.
 Loh et al. (2015) Loh, W.Y., He, X., and Man, M. (2015). A regression tree approach to identifying subgroups with differential treatment effects. Statistics in Medicine, 34:1818–1833.
 Negassa et al. (2005) Negassa, A., Ciampi, A., Abrahamowicz, M., Shapiro, S., and Boivin, J.F. (2005). Treestructured subgroup analysis for censored survival data: Validation of computationally inexpensive model selection criteria. Statistics and Computing, 15(3):231–239.
 SanGiovanni et al. (2013) SanGiovanni, J. P., Chen, J., Sapieha, P., Aderman, C. M., Stahl, A., Clemons, T. E., Chew, E. Y., , and Smith, L. E. H. (2013). DNA sequence variants in PPARGC1A, a gene encoding a coactivator of the ω3 LCPUFA sensing PPARRXR transcription complex, are associated with nv amd and amdassociated loci in genes of complement and VEGF signaling pathways. PLoS One, 8(1):e53155.
 Seddon et al. (2016) Seddon, J., Silver, R., and B, R. (2016). Response to AREDS supplements according to genetic factors: survival analysis approach using the eye as the unit of analysis. The British journal of ophthalmology, 100(12):1731–1737.
 Seddon et al. (2007) Seddon, J. M., Francis, P. J., George, S., Schultz, D. W., Rosner, B., and Klein, M. L. (2007). Association of CFH Y402H and LOC387715 A69S With Progression of AgeRelated Macular Degeneration. JAMA, 297(16):1793–1800.
 Su et al. (2008) Su, X., Zhou, T., Yan, X., Fan, J., and Yang, S. (2008). Interaction trees with censored survival data. The International Journal of Biostatistics, 4(1):2.
 Sun and Ding (2019) Sun, T. and Ding, Y. (2019). Copulabased semiparametric regression method for bivariate data under general interval censoring. Biostatistics.
 Vavvas et al. (2018) Vavvas, D. G., Small, K. W., Awh, C. C., Zanke, B. W., Tibshirani, R. J., and Kustra, R. (2018). CFH and ARMS2 genetic risk determines progression to neovascular agerelated macular degeneration after antioxidant and zinc supplementation. Proceedings of the National Academy of Sciences of the United States of America, 115(4):E696–E704.
 Wakusawa et al. (2008) Wakusawa, R., Abeb, T., Satoa, H., Yoshidaa, M., Kunikataa, H., Satoc, Y., and Nishidaa, K. (2008). Expression of vasohibin, an antiangiogenic factor, in human choroidal neovascular membranes. American Journal of Ophthalmology, 146(2):235–243.
 Wu et al. (2016) Wu, R.f., Zheng, M., and Yu, W. (2016). Subgroup analysis with timetoevent data under a logisticcox mixture model. Scandinavian Journal of Statistics, 43:863–878.
 Yan et al. (2018) Yan, Q., Ding, Y., Liu, Y., Sun, T., Fritsche, L. G., Clemons, T., Ratnapriya, R., Klein, M. L., Cook, R. J., Liu, Y., Fan, R., Wei, L., Abecasis, G. R., Swaroop, A., Chew, E. Y., Group, A. R., Weeks, D. E., and Chen, W. (2018). Genomewide analysis of disease progression in agerelated macular degeneration. Human Molecular Genetics, 27(5):929–940.
 Zeng et al. (2012) Zeng, S., Hernandez, J., and Mullins, R. F. (2012). Effects of antioxidant components of AREDS vitamins and zinc ions on endothelial cell activation: Implications for macular degeneration. Investigative Ophthalmology & Visual Science, 53(2):1041–1047.
Comments
There are no comments yet.