1 Introduction
Bivariate timetoevent endpoints are frequently used as coprimary outcomes in biomedical and epidemiological fields. For example, two timetoevent endpoints are often seen in clinical trials studying the progression (or recurrence) of bilateral diseases (e.g., eye diseases) or complex diseases (e.g. cancer and psychiatric disorders). The two endpoints are correlated as they come from the same individual. Bivariate intervalcensored data arise when both events are not precisely observed due to intermittent assessment times. A further complication is that the event status can be indeterminate (i.e., rightcensored) for individuals who are eventfree at their last assessment time. The special case when there exists only one assessment time, leading to the bivariate current status data, can also happen for some individuals. Therefore, the bivariate data we are interested in modeling are under general interval censoring, which includes the mixture of case 1 (current status), case 2 interval censoring and right censoring.
Our motivating example of such bivariate general intervalcensored data came from a large clinical trial studying the progression of a bilateral eye disease, Agerelated Macular Degeneration (AMD), of which the twoeyes from the same patient were periodically examined for lateAMD. The study aims to discover genetic variants that are significantly associated with AMD progression, as well as to characterize both the joint and conditional risks of AMD progression. For example, the joint 5year progressionfree probability for both eyes is a clinically important measure to group patients into different risk categories. Similarly, for patients who have one eye already progressed, the conditional 5year progressionfree probability for the nonprogressed eye (given its fellow eye already progressed) is important to both clinicians and patients. Therefore, a desired statistical approach would be able to characterize and predict both joint and conditional risk profiles, as well as to assess the covariate effects on them.
There have been several popular approaches for modeling bivariate intervalcensored data. For example, Goggins & Finkelstein (2000), Kim & Xue (2002), Chen et al. (2007), Tong et al. (2008) and Chen et al. (2013) fitted various marginal models for multivariate intervalcensored data. All these approaches model the marginal distributions based on the working independence assumption, and thus cannot produce joint or conditional distributions. Another popular approach is based on frailty models (for example, Oakes, 1982), which are mixed effects models with a latent frailty variable applied to the conditional hazard functions. For example, Chen et al. (2009) and Chen et al. (2014) built frailty proportional hazards models with piecewise constant baseline hazards for multivariate current status data and intervalcensored data, respectively. Wen & Chen (2013) and Wang et al. (2015) developed gammafrailty proportional hazards models for bivariate intervalcensored data through a nonparametric maximum likelihood estimation approach and bivariate current status data through a splinebased sieve estimation approach, respectively. Recently, Zhou et al. (2017) and Zeng et al. (2017) proposed frailtybased transformation models for bivariate or multivariate intervalcensored data, and obtained parameter estimators through the nonparametric maximum likelihood estimation and sieve maximum likelihood estimation, respectively.
The third popular approach is based on copula models (Clayton, 1978, for example)
. Unlike the marginal or frailty approaches, the copulabased methods directly connect the two marginal distributions through a copula function to construct the joint distribution, of which the dependence is determined by the copula parameter. This unique feature makes the modeling of the margins separable from the copula function, which is attractive from both the modeling perspective and the interpretation purpose. Moreover, the challenge from censoring can be naturally handled through the marginal distributions within the copula function. Both joint and conditional distributions can be obtained from copula models. Various copula models have been proposed in the literature.
Wang et al. (2008) used sieve estimation in a copula model with proportional hazards margins for bivariate current status data. Cook & Tolusso (2009) and Kor et al. (2013) developed estimating equations for copula models with piecewise constant baseline marginal hazards for clustered current status and intervalcensored data, respectively. Hu et al. (2017) developed the sieve estimation approach for copula model with proportional hazards margins for bivariate current status data. To date, most copulabased regression models only handle a specific interval censoring type (e.g., case 1 current status or case 2 interval censoring) and are often limited to the proportional hazards assumption. Also, the most frequently used copula models, such as Clayton, Gumbel and Frank, all use only one parameter to characterize the betweenmargin dependence, which can be lack of flexibility.In this work, we propose a general class of copulabased semiparametric transformation model for bivariate data subject to general interval censoring. Specifically, we build a twoparameter copula model framework, which can handle more flexible dependence structures than oneparameter copulas. Our method incorporates a broad class of semiparametric regression models that do not assume any specific distribution for the margins. We approximate the infinitedimensional nuisance parameters using sieves with Bernstein polynomials, and propose a novel twostep maximum likelihood estimation procedure which is computationally stable and efficient. We establish the asymptotic normality and efficiency for the sieve estimators of finitedimensional model parameters using an Mtheorem (that can handle two separate infinitedimensional parameters). Moreover, we develop a generalized score test with numerical approximations of the score function and observed Fisher information for testing covariate effects.
The paper is organized as follows. Section 2 introduces the model and the joint likelihood function. Section 3 presents the sieve maximum likelihood estimation procedure, the asymptotic properties and the generalized score test. Section 4 illustrates extensive simulation studies for the estimation and testing performances of our proposed methods. We analyze the AREDS data and present the findings in Section 5. Finally, we discuss and conclude in Section 6. The regularity conditions and the proofs of two theorems are included in the Appendix, with additional lemmas and a general Mtheorem presented in the supplemental material.
2 Notation and Likelihood
2.1 Copula model for bivariate censored data
Let be the true bivariate event times, with marginal survival functions , and joint survival function . Assume there are independent subjects in a study. For subject , we observe , where is the time interval that lies in and
is the covariate vector. When
, is rightcensored, and when , is leftcensored.By the Sklar’s theorem (Sklar, 1959), so long as the marginal survival functions are continuous, there exists a unique function that connects two marginal survival functions into the joint survival function: Here, the function is called a copula, which maps onto and its parameter measures the dependence between the two margins. A signature feature of the copula is that it allows the dependence to be modeled separately from the marginal distributions (Nelsen, 2006).
One popular copula family for bivariate censored data is the Archimedean copula family, which usually has an explicit formula. Two frequently used Archimedean copulas are the Clayton (Clayton, 1978) and Gumbel (Gumbel, 1960) copula models, which account for the lower or upper tail dependence between two margins using a single parameter.
Here, we consider a more flexible twoparameter Archimedean copula model, which is formulated as
(1) 
where and
are two uniformly distributed margins. The two dependence parameters (
and ) account for the correlation between and at both upper and lower tails, respectively, and they are explicitly connected to the Kendall’s with . In particular, when , the twoparameter copula (1) becomes the Clayton copula, and when , it becomes the Gumbel copula. Thus, the twoparameter copula model provides more flexibility in characterizing the dependence than the Clayton or Gumbel copula.2.2 Joint likelihood for bivariate data under general interval censoring
Using the notation introduced in Section 2.1, the joint likelihood function using the twoparameter copula model can be written as
(2)  
For a given subject , if is rightcensored, then any term involving becomes 0 (since is set to be ). Then the joint survival function for subject reduces to either only one term (if both and are rightcensored) or two terms (if one is rightcensored). The special case of current status data can also fit into this model frame, where either is 0 (if the event has already occurred before the examination time, which is in this case) or is (if the event has not occurred upon the examination time, which is in this case). Therefore, the likelihood function (2) can handle the general form of bivariate intervalcensored data.
Next, we will model both the dependence parameters () and two marginal survival functions () together.
2.3 Semiparametric linear transformation model for marginal survival functions
We consider the semiparametric transformation model for marginal survival functions:
(3) 
where is a prespecified strictly increasing function, is a vector of unknown regression coefficients, and is an unknown nondecreasing function of . In model (3), the transformation function , the regression parameter and the infinitedimensional parameter are all denoted as marginspecific (indexed by ) for generality. In practice, some or all of them can be the same for the two margins and in that case, the corresponding index can be dropped.
This model (3) contains a class of survival models. For example, when , the marginal survival function follows a proportional hazards model. When , it becomes a proportional odds model. In practice, the transformation function can be also “estimated” by the data. For example, the commonly used BoxCox transformation , , or the logarithmic transformation , , can be assumed. The proportional hazards and proportional odds models are the special cases in both transformation classes. Then the parameter in can be estimated together with other parameters in the likelihood, as we will demonstrate in our simulation studies.
3 Estimation and Inference
3.1 Sieve likelihood with Bernstein polynomials
In our likelihood function, we are interested in estimating the unknown parameter :
Here with being the dimension of and being a positive constant. We denote by the collection of all bounded, continuous and nondecreasing, nonnegative functions over , where . In practice, can be chosen as the range of all and .
In our loglikelihood function , there are finitedimensional parameters of interest and two infinitedimensional nuisance parameters , which need to be estimated simultaneously. Unlike the rightcensored data, the intervalcensored data do not apply tools like partial likelihood and martingale due to the absence of exact event times. Instead, following Huang & Rossini (1997), we employ the sieve approach and form a sieve likelihood. Specifically, similar to Zhou et al. (2017), we use Bernstein polynomials to build a sieve space . Here, is the space defined by Bernstein polynomials:
where represents the Bernstein basis polynomial defined as:
(4) 
with degree for some . We assume the basis polynomials are the same between the two margins, while the coefficients can be marginspecific. In practice, with a prespecified , we solve together with other parameters .
The use of Bernstein polynomials naturally fits the intervalcensored data as they are built based on intervals. One big advantage of Bernstein polynomials is that they can naturally achieve the nonnegativity and monotonicity properties of through reparameterization (Zhou et al., 2017). Another advantage of Bernstein polynomials is that they do not require the specification of interior knots, as seen from (4), making them easier to work with.
With the sieve space defined above, will be approximated by . In the next section, we propose an estimation procedure to maximize over the sieve space to obtain the sieve maximum likelihood estimators .
3.2 Estimation procedure for sieve maximum likelihood estimators
We develop a novel twostep sieve maximum likelihood estimation procedure that applies to any choice of Archimedean copulas and marginal models. In principle, we can obtain the sieve maximum likelihood estimators by maximizing the joint likelihood function (2) in one step. Due to the complex structure of the joint likelihood function, we recommend using a separate step to obtain appropriate initial values for all the unknown parameters. In essence, are first estimated marginally in step 1(a). Then their estimators are plugged into the joint likelihood to form a pseudolikelihood. In step 1(b), the dependence parameters are estimated through maximizing the pseudolikelihood function. Finally, using initial values from step 1(a) and 1(b), we update all the unknown parameters simultaneously under the joint loglikelihood function in step 2. The estimation procedure is described below:

Obtain initial estimates of :

, where denotes the sieve loglikelihood for the marginal model, ;

, where and are the initial estimates from (a), and is the joint sieve loglikelihood.


Simultaneously maximize the joint sieve loglikelihood to get final estimates:
with initial values obtained from step 1(a) and 1(b).
The twostep procedure guarantees almost convergence and our simulations show that the computing speed in step 2 is significantly improved by using initial values obtained from step 1. In fact, the initial estimates from step 1 are already consistent (Sun et al., 2006)
. Step 2 produces correct variancecovariance estimates for all the parameters using the joint likelihood. Some standard optimization algorithms such as the NewtonRaphson algorithm or the conjugate gradient algorithm can be employed to obtain the maximizers and Hessian matrix. Due to the complex structure of the joint sieve loglikelihood, instead of analytically deriving the first and second order derivatives, we propose to use the Richardson’s extrapolation (
Lindfield & Penny, 1989) to approximate the score function and observed Fisher information numerically.3.3 Asymptotic Properties of Sieve Estimators
To establish the asymptotic properties of the sieve maximum likelihood estimators , we need to define the distance between two s. Let be the Euclidean norm for a vector . Define the supremum norm for a function . Also define for a function under the probability measure . Then the norm for is defined as , where
denotes the joint cumulative distribution function of
and . Finally we define the distance between and asLet denote the true value of . The following theorems present the convergence rate, asymptotic normality and efficiency of the sieve estimators.
Theorem 1
(Convergence rate) Assume that Conditions 12 and 45 in the Appendix hold. Let , where and be the smoothness parameter of as defined in Condition 4, then we have
Theorem 1 states that the sieve estimator has a polynomial convergence rate. Although this overall convergence rate is lower than , the following normality theorem states that the proposed estimators of the finitedimensional parameters () are asymptotically normal and semiparametrically efficient.
Theorem 2
(Asymptotic normality and efficiency) Suppose Conditions 15 in the Appendix hold. Define and . If , then
where and is the efficient score function defined in the proof. Therefore, are asymptotically normal and efficient.
3.4 Generalized score test
We now separate into two parts: and , where is the parameter set of interest for hypothesis testing and denotes the rest of the regression coefficients. The likelihoodbased tests such as the Wald, score, and likelihoodratio tests can be constructed to test , and they are asymptotically equivalent. In our motivating study, we aim to perform a GWAS on AMD progression, which contains testing millions of SNPs onebyone. Therefore, the computing speed is critical. We propose to use the generalized score test. One big advantage of the score test in a GWAS setting is, one only needs to estimate the unknown parameters once under the null model without any SNP (i.e., ), since the nongenetic risk factors are the same no matter which SNP is being tested. Therefore, the score test is faster as compared to the likelihood ratio or Wald test. Moreover, the convergence rate is easily guaranteed by performing estimation only once if using the score test.
With the sieve joint likelihood, we can obtain the restricted sieve maximum likelihood estimators under (
and the rest parameters are arbitrary), and then calculate the generalized score test statistic as defined in
Cox & Hinkley (1979). Similar as in our twostep estimation procedure, we also propose to use the Richardson’s extrapolation to numerically approximate the first and second order derivatives when calculating the score test statistic.4 Simulation study
We first evaluated the parameter estimation of our proposed twoparameter copula sieve model for bivariate data under general interval censoring. Then we assessed the typeI error control, power performance and computational speed of the proposed generalized score test.
4.1 Generating bivariate intervalcensored times
The data were simulated from various Archimedean copula models (i.e., Clayton, Frank, Ali–Mikhail–Hap (AMH) and Joe) with Loglogistic margins. We first generated bivariate true event times using the conditioning approach described in Sun et al. (2018). To obtain intervalcensored data, we followed the censoring procedure in Kiani & Arasan (2012), which fits the study design of AREDS data. Specifically, we assumed each subject was assessed for
times with the length between two adjacent assessment times following an Exponential distribution. In the end, for each subject
, was defined as the last assessment time before and was the first assessment time after . The overall rightcensoring rate is set to be .For the dependence strength between margins, we set Kendall’s at 0.2 or 0.6, indicating weak or strong dependence. We assumed that the two event times share a common baseline distribution, for example Loglogistic with scale and shape or Weibull with scale and shape .
We included both genetic and nongenetic covariates in the simulations. Specifically, each SNP, coded as 0 or 1 or 2, was generated from a multinomial distribution with probabilities , where or is the minor allele frequency (MAF). We also included a marginspecific continuous variable, generated from
, and a subjectspecific binary variable, generated from Bernoulli (
).Under all scenarios, the sample size was set as . For simplicity, we assumed the same covariate effects for two margins, denoted as , where and are marginal and subjectspecific nongenetic effects, respectively, and is the SNP effect. We set . For estimation performance evaluation, we let and replicated 1,000 times. For typeI error control evaluation of testing , we replicated 100,000 times and evaluated at various tail levels: 0.05, 0.01, 0.001 and 0.0001, respectively. For power evaluation, we replicated 1,000 times under each SNP effect size, where a range of ’s were selected to represent weak to strong SNP effects.
4.2 SimulationI: parameter estimation
In this section, we evaluated the estimation performance of our proposed method under both correct and misspecified settings. For the sieve margins, we used the true linear transformation function. We assumed the same Bernstein coefficients
with degree () for , . For the event time range , we chose and set as the largest value of all plus a constant.In Table 1, the true model is Clayton copula with Loglogistic (proportional odds) or Weibull (proportional hazards) margins, with Kendall’s . We fitted three models: the true parametric copula model (i.e., Clayton copula with Loglogistic or Weibull margins), a twoparameter copula model with sieve margins (“Copula2S”) and a marginal sieve model with the robust variancecovariance estimate (“RobustS”) (Zhou et al., 2017
). We obtained estimation biases and 95% coverage probabilities for regression coefficients and dependence parameters. Under the twoparameter copula model, the sieve maximum likelihood estimators are all virtually unbiased and all empirical coverage probabilities are close to the nominal level. Moreover, their standard errors are virtually the same as the standard errors under the true parametric model, indicating our proposed method works well. For the robust marginal sieve model, the regression coefficient estimates are also unbiased with correct coverage probabilities, but their standard errors are apparently larger.
True  Copula2S  RobustS  
Param  Bias  SE  SEE (CP)  Bias  SE  SEE (CP)  Bias  SE  SEE (CP) 

proportional odds  
0.0013  0.0171  0.0163 (0.942)  0.0003  0.0176  0.0165 (0.938)  0.0024  0.0293  0.0300 (0.930)  
0.0120  0.1300  0.1300 (0.945)  0.0006  0.1330  0.1310 (0.939)  0.0110  0.1510  0.1500 (0.944)  
0.0007  0.0927  0.0942 (0.953)  0.0110  0.0951  0.0947 (0.950)  0.0012  0.1050  0.1090 (0.955)  
0.0005  0.0210  0.0208 (0.944)  0.0045  0.0223  0.0221 (0.950)  NA  NA  NA  
proportional hazards  
0.0012  0.0097  0.0103 (0.958)  0.0013  0.0099  0.0105 (0.957)  0.0009  0.0182  0.0187 (0.957)  
0.0043  0.0780  0.0789 (0.952)  0.0040  0.0782  0.0788 (0.951)  0.0043  0.0960  0.0969 (0.957)  
0.0005  0.0606  0.0569 (0.935)  0.0002  0.0608  0.0569 (0.938)  0.0003  0.0722  0.0701 (0.945)  
0.0003  0.0220  0.0219 (0.952)  0.0012  0.0224  0.0221 (0.951)  NA  NA  NA 
In the real setting, the value of the transformation function parameter is often unknown. Therefore, we examined our methods in estimating the transformation function parameter together with the other parameters in our proposed model. Table 2 shows satisfactory estimation performance for all parameters including the transformation parameter in both proportional hazards and proportional odds settings.
Param  Bias  SE  SEE (CP)  Bias  SE  SEE (CP) 

proportional odds  proportional hazards  
0.0011  0.0169  0.0159 (0.934)  0.0004  0.0109  0.0104 (0.942)  
0.0111  0.1289  0.1278 (0.938)  0.0032  0.0816  0.0806 (0.945)  
0.0030  0.0902  0.0923 (0.961)  0.0035  0.0584  0.0582 (0.952)  
0.0384  0.1294  0.1274 (0.919)  0.0463  0.1532  0.1574 (0.952)  
0.0008  0.0224  0.0216 (0.944)  0.0006  0.0225  0.0227 (0.946) 
We also simulated bivariate current status data by setting to examine how the proposed method works in the special case of case 1 interval censoring. As shown in Table 3, Copula2S works as well as the true model in this setting too. The larger standard errors are due to less information in current status data as compared to the standard case 2 interval censoring case.
True  Copula2S  RobustS  
Param  Bias  SE  SEE (CP)  Bias  SE  SEE (CP)  Bias  SE  SEE (CP) 

proportional odds  
0.0031  0.0399  0.0394 (0.956)  0.0033  0.0400  0.0393 (0.958)  0.0002  0.0536  0.0538 (0.942)  
0.0203  0.2563  0.2516 (0.948)  0.0219  0.2563  0.2527 (0.946)  0.0249  0.2646  0.2608 (0.946)  
0.0002  0.1819  0.1816 (0.947)  0.0001  0.1808  0.1822 (0.944)  0.0008  0.1855  0.1938 (0.948)  
0.0038  0.0587  0.0575 (0.947)  0.0025  0.0680  0.0660 (0.939)  NA  NA  NA  
proportional hazards  
0.0008  0.0330  0.0321 (0.947)  0.0005  0.0333  0.0322 (0.946)  0.0003  0.0397  0.0420 (0.944)  
0.0001  0.1785  0.1813 (0.949)  0.0025  0.1831  0.1830 (0.945)  0.0030  0.1857  0.1958 (0.956)  
0.0043  0.1298  0.1312 (0.959)  0.0052  0.1319  0.1324 (0.959)  0.0038  0.1346  0.1406 (0.956)  
0.0013  0.0637  0.0628 (0.943)  0.0029  0.0682  0.0665 (0.931)  NA  NA  NA 
We further evaluated the estimation performance of the proposed model on bivariate intervalcensored data generated from copula models that do not belong to the twoparameter copula family, such as Frank copula with , AMH copula with ( is always restricted to be for AMH copula) and Joe copula with . In Table 4, the regression coefficient estimates from the twoparameter copula are all unbiased with coverage probabilities close to 95%. The biases for the estimates are also minimal with good coverage probabilities except in the scenario when data were generated from a Joe copula (coverage probability = 83%). Overall, the twoparameter copula model family demonstrates good robustness to misspecification in copula functions.
Frank  AMH  Joe  

Param  Bias  SE  SEE (CP)  Bias  SE  SEE (CP)  Bias  SE  SEE (CP) 
0.0002  0.0177  0.0176 (0.950)  0.0011  0.0262  0.0267 (0.953)  0.0016  0.0160  0.0166 (0.962)  
0.0018  0.1480  0.1470 (0.944)  0.0013  0.1250  0.1250 (0.951)  0.0027  0.1388  0.1438 (0.954)  
0.0001  0.1050  0.1060 (0.952)  0.0001  0.0885  0.0901 (0.959)  0.0037  0.0984  0.1043 (0.962)  
0.0036  0.0219  0.0198 (0.937)  0.0056  0.0318  0.0304 (0.934)  0.0168  0.0195  0.0185 (0.830) 
4.3 SimulationII, typeI error
We evaluated the typeI error control of our proposed generalized score test under Copula2S. Specifically, we tested the SNP effect under different dependence strengths (Kendall’s ) and two different MAFs (40%, 5%). The true model is Clayton copula with Loglogistic margins. We included score tests of two misspecified copula models, one with misspecified margins but correct copula (i.e., Clayton copula with Weibull margins) and the other with misspecified copula but correct margins (i.e., Gumbel copula with Loglogistic margins). We also included the score test under the true parametric copula model (i.e., Clayton copula with Loglogistic margins), which served as the benchmark model. In addition, we examined Wald tests from the marginal Loglogistic model with variancecovariance either from the independence estimate or the robust sandwich estimate.
Table 5 shows type I errors under different tail levels. In the top part where Kendall’s , our proposed score test controls typeI errors as well as the true parametric model at all tail levels and MAFs. However, typeI errors in the two misspecified copula models are inflated at all scenarios, especially when margins are wrong at MAF . It is not surprising to observe the greatest inflation occurs with the marginal approach under the independence assumption. After applying the robust variancecovariance estimate, the typeI errors seem to be controlled at MAF = 40%, but are still slightly inflated at MAF = 5%. When Kendall’s , the proposed twoparameter model still performs as well as the correct parametric model and outperforms the other models, although the typeI error inflations from other models were smaller due to the weaker dependence.
MAF  Tail  IndepLL  RobustLL  ClaytonW  GumbelLL  Copula2S  ClaytonLL 

Kendall’s  
40%  0.05  0.141  0.051  0.131  0.065  0.052  0.050 
0.01  0.053  0.010  0.041  0.015  0.010  0.010  
0.001  0.0131  0.0012  0.0074  0.0022  0.0013  0.0012  
0.0001  0.0037  0.0002  0.0012  0.0003  0.0001  0.0001  
5%  0.05  0.141  0.056  0.059  0.066  0.053  0.051 
0.01  0.053  0.014  0.012  0.016  0.012  0.011  
0.001  0.0136  0.0018  0.0013  0.0020  0.0013  0.0012  
0.0001  0.0034  0.0003  0.0002  0.0003  0.0002  0.0002  
Kendall’s  
40%  0.05  0.083  0.051  0.103  0.061  0.051  0.050 
0.01  0.022  0.010  0.029  0.013  0.010  0.010  
0.001  0.0036  0.0012  0.0045  0.0017  0.0011  0.0010  
0.0001  0.0006  0.0002  0.0006  0.0003  0.0002  0.0002  
5%  0.05  0.083  0.056  0.054  0.060  0.053  0.052 
0.01  0.023  0.013  0.011  0.014  0.012  0.011  
0.001  0.0036  0.0017  0.0013  0.0018  0.0014  0.0013  
0.0001  0.0007  0.0003  0.0001  0.0002  0.0002  0.0001 
4.4 SimulationIII, power
We compared the score test under our Copula2S model with the score tests from two other models (that appropriately control the typeI error in Table 5): the true parametric copula model and the RobustS model. Figure 1 presents the power curves of these three tests over a range of SNP effect sizes. Our proposed model yields the same power as the true parametric copula model, and is considerably more powerful than the robust marginal sieve model.
4.5 SimulationIV, computing speed
We examined the computational advantages of our twostep sieve estimation procedure as compared to the onestep estimation in the setting where data were simulated from a Clayton copula with Loglogistic margins. For replications, the onestep method took seconds while our twostep procedure took seconds, saving about computing time.
We also compared the computing speed of three likelihoodbased tests on testing SNPs under three models: the true Clayton model with Loglogsitc margins, our proposed Copula2S model and the RobustS model. The 1,000 genetic variants were simulated from MAF . The results are shown in Table 6. We found that the score test is about 35 times faster than the Wald test or the likelihood ratio test on average. Within the three score tests, although the score test under our Copula2S model is the slowest due to model complexity, it is still faster than the Wald test under the RobustS model. Given its advantages in robustness, typeI error control and power performance, we recommend the proposed Copula2S model with the score test for largescale testings of bivariate data subject to general censoring.
Time (seconds)  Score  Wald  LRT 

RobustS  148  693  NA 
ClaytonLL  286  1254  1146 
Copula2S  475  1679  1570 
5 Real data analysis
We implemented our proposed method to analyze the AREDS data. AREDS was designed to assess the clinical course of, and risk factors for the development and progression of AMD. DNA samples were collected from the consenting participants and genotyped by the International AMD Genomics Consortium (Fritsche et al., 2016). In this study, each participant was examined every 6 months in the first 6 years and then every 1 year after year 6. To measure the disease progression, a severity score, scaled from 1 to 12 (with larger value indicating more severe AMD), was determined for each eye of each participant at every examination. The outcome of interest is the bivariate progression timetolateAMD, where lateAMD is defined as the stage with severity score . Both phenotype and genotype data of AREDS are available from the online repository dbGap (accession: phs000001.v3.p1, and phs001039.v1.p1, respectively). By far, all the studies that analyzed the AREDS data for AMD progression treated the outcome as rightcensored (e.g., Ding et al. (2017), Yan et al. (2018), and Sun et al. (2018)), and some only used data from one eye for the analysis (e.g., Seddon et al. (2014)).
We analyzed 2718 Caucasian participants, including 2295 subjects who were free of lateAMD in both eyes at the enrollment, i.e., time (denoted as group A), and 423 subjects who have one eye already progressed to lateAMD by enrollment (denoted as group B). For the th eye (free of lateAMD at time 0) of subject , we observe , the last assessment time when the th eye was still free of lateAMD and , the first assessment time when the th eye was already diagnosed as lateAMD. For the eye that did not progress to lateAMD by the end of the study followup, we assigned a large number to . Since there are two groups of subjects (group A and B), we implemented a twopart model. Specifically, we created a covariate for each eye to indicate whether its fellow eye had already progressed or not at time 0 (i.e., for both eyes of group A subjects and for group B subjects). Then the joint likelihood is the product of the copula sieve model for group A subjects and the marginal sieve model for group B subjects.
We included four risk factors as nongenetic covariates, including the baseline age, severity score, smoking status, and felloweye progression status. We checked various transformation functions and Bernstein polynomial degrees , and chose the model that produced the smallest aic, which is the proportional odds model with for both margins.
We performed GWAS on 6 million SNPs (either from exome chip or imputed) with MAF
across the 22 autosomal chromosomes and plotted their in Figure 2. As highlighted in the figure, the PLEKHA1–ARMS2–HTRA1 region on chromosome 10 and the CFH region on chromosome 1 have variants reaching the “genomewide” significance level (). Previously, these two regions were found being significantly associated with AMD onset from multiple casecontrol studies (Fritsche et al., 2016). Moreover, we identified SNPs in a previously unrecognized ATF7IP2 region on chromosome 16, showing moderate to strong association with AMD progression (). As a comparison, we also fitted the robust marginal sieve model (RobustS) and performed the corresponding score test for each SNP. Overall, the results are consistent with our Copula2S model, but the values are generally larger (as shown in Table 7). Note that the CFH region did not reach the “genomewide” significance level under the RobustS model.Table 7 presents the top significant variants of the three regions denoted in Figure 2. The odds ratio of a SNP was calculated by fitting a model including this SNP and those nongenetic factors. For example, , a known AMD risk variant from HTRA1 region, has an estimated odds ratio of 1.66 (95% CI ), which implies its minor allele has a “harmful” effect on AMD progression. Under this model, the estimated dependence parameters are and , corresponding to , which indicates moderate dependence in AMD progression between two eyes. In addition, the upper and lower tail Kendall’s are and , suggesting a stronger betweeneye dependence at the later stage of progression.
SNP  Chr  Gene  MAF  OR  (Copula2S)  (RobustS) 

10  HTRA1  0.33  1.66  
10  ARMS2HTRA1  0.33  1.65  
10  ARMS2HTRA1  0.34  1.62  
10  PLEKHA1  0.24  1.63  
1  CFH  0.28  0.64  
1  CFH  0.28  0.64  
1  CFH  0.28  0.64  
1  CFH  0.28  0.64  
16  ATF7IP2  0.13  1.62  
16  ATF7IP2  0.13  1.62 
For variant , we obtained both estimated joint and conditional survival functions from the fitted Copula2S model. The left panel of Figure 3 plots the joint progressionfree probability contours for subjects who are smokers with the same age (= 68.6) and AMD severity score (= 3.0 for both eyes) but different genotypes of . The right panel of Figure 3 plots the corresponding conditional progressionfree probability of remaining years (after year 5) for one eye, given its fellow eye has progressed by year 5. It is clearly seen that in both plots, the three genotype groups are well separated, with the group having the largest progressionfree probabilities. These estimated progressionfree probabilities provide valuable information to characterize or predict the progression profiles for AMD patients with different genetic characteristics.
6 Conclusion and Discussion
We proposed a flexible copulabased semiparametric transformation model for analyzing and testing bivariate (general) intervalcensored data. We estimated the model parameters through a computationally efficient and stable twostep sieve estimation approach using Bernstein polynomials. We studied asymptotic properties of the proposed sieve estimators and established asymptotic normality and efficiency for the finitedimensional model parameters. Our extensive simulations demonstrated satisfactory estimation and testing performance of our proposed method under various practical settings. We implemented our proposed methods in R. The key functions can be found in the GitHub https://github.com/yingding99/Copula2S.
As we mentioned in the Introduction section, frailty models are another popular approach for analyzing bivariate censored data. Goethals et al. (2008) presents the connection and distinction between copula and frailty models. For example, the Clayton copula has the same mathematical expression as the Gamma frailty model in terms of the joint survival distribution. However, their marginal survival functions are modeled differently. Specifically, the marginal function under the Clayton model only involves the time and covariate effects, whereas the marginal function under the Gamma frailty model involves not only the time and covaraite effects but also the frailty parameter. As a result, the joint distribution functions of the Clayton copula and Gamma frailty models are not equivalent. The two joint distribution functions are identical only when the two margins are independent. Moreover, the parameter estimation strategies are usually different between copula and frailty models. More discussions can be found in Wienke (2010). In this paper, the objectives of our real study lead us to choose copulabase models, which offer a straightforward interpretation of covariate effects and dependence strength, as well as produce joint and conditional survival distributions.
Several model selection procedures have been proposed for copulabased methods. For example, the AIC is widely used for model selection purpose in copula models. Wang & Wells (2000) proposed a model selection procedure based on the nonparametric estimation of the bivariate joint survival function within Archimedean copulas. For model diagnostics, Chen et al. (2010) proposed a penalized pseudolikelihood ratio test for copula models in complete data. Recently, Zhang et al. (2016) proposed a goodnessoffit test for copula models using the pseudo inandoutof sample method. To the best of our knowledge, there is no existing goodnessoffit test for copula models of bivariate intervalcensored data. In our real data analysis, we used AIC to guide the model selection for simplicity. However, a formal test for goodnessoffit is desirable, especially for bivariate intervalcensored data under the regression setting. It is worthwhile to investigate it as a future research direction.
We applied our method to a GWAS of AMD progression and successfully identified variants from two known AMD risk regions (CFH on chromosome 1 and PLEKHA1–ARMS2–HTRA1 on chromosome 10) being significantly associated with AMD progression. Moreover, we also discovered variants from a region (ATF7IP2 on chromosome 16), which has not been reported before, showing moderate to strong association with AMD progression. On the contrary, we found that some known AMD risk loci (e.g., from ARHGAP21 on chromosome 10, ) are not associated with AMD progression. Therefore, the variants associated with higher (or lower) risks of having AMD may not be necessarily associated with the disease progression; while some variants may be only associated with AMD progression but not with the disease onset. This is the first research on AMD progression which adopts a solid statistical model that appropriately handles bivariate intervalcensored data. Our findings provided new insights into the genetic causes on AMD progression, which are critical for the next step to establish novel and reliable predictive models of AMD progression to accurately identify highrisk patients at an early stage. Our proposed method is applicable to general bilateral diseases and complex diseases with coprimary endpoints.
7 Appendix
7.1 Regularity Conditions
Condition 1. (i) There exists such that ; (ii) The union of the supports for distributions of and is an interval with .
Condition 2. The distribution of covariate has a bounded support and is not concentrated on any proper subspace of , .
Condition 3. Let be the likelihood function with being substituted by . Define
with . There exist for which there are different sets of such that if with for each of these sets of values, then .
Condition 4. (i) The function is continuously differentiable up to order with in and satisfies for some positive constant . Also is an interior point of . (ii) The transformation is a strictly increasing function with and is threetimes continuously differentiable in .
Condition 5. For every in a neighborhood of , , where being the loglikelihood function given in Section 3 and means that “the lefthand side is bounded above by a constant times the righthand side”.
Conditions 1, 2, 4(i) and 5 are commonly used in the studies of intervalcensored data (e.g. Huang & Rossini, 1997, Wen & Chen, 2013, Zhou et al., 2017). Condition 4(ii) comes from the definition of the linear transformation model (Cheng et al., 1995). Condition 3 ensures both the identifiability of the parameters and the positivity of the efficient Fisher information matrix (Chang et al., 2007, Wen & Chen, 2013, Zhou et al., 2017).
7.2 Proof of Theorem 1
We will derive the convergence rate by verifying the Conditions C1–C3 of Theorem 1 from Shen & Wong (1994). Define , where is the collection of with smoothness as defined in our Condition 4. Similarly, is the corresponding sieve space containing . Then the Condition C1 automatically holds due to our Condition 5, which states for any , Next we verify the Condition C2 of Shen & Wong (1994). Similar to the arguments in the proof of Lemma A3 (using our Conditions 1, 2, 4), we can show that for any ,
where Then, it follows that for any
It implies that
Thus, Condition C2 from Shen & Wong (1994) holds (with in their notation).
Finally, we verify the Condition C3 in Shen & Wong (1994). By Lemma A3, for , we have , with being the dimensionality for . Using the fact that the covering number is bounded by the bracketing number, it follows that
Hence, the Condition C3 of Shen & Wong (1994) in page 583 holds when the constants and in their notations.
Therefore, the constant in Theorem 1 of Shen & Wong (1994) on page 584 is . Since as , we can pick a slightly larger than such that for large . We still denote as so that . Since maximizes over , so satisfies the inequality (1.1) in Shen & Wong (1994) when in their notation. By Lemma A2, there exists a such that . Thus, the sieve approximation error in Shen & Wong (1994) is . In addition, since is maximized at , its first derivative at is equal to . Then, applying the Taylor expansion for around and plugging in , the KullbackLeilber pseudodistance of and follows
The second last inequality holds due to boundness of all second order derivatives of loglikelihood in Lemma A1 as well as derivations similar to Lemma A3. The last inequality holds since .
Therefore, . Hence, by Theorem 1 of Shen & Wong (1994), we obtain the convergence rate for as
This completes the proof of our Theorem 1.
7.3 Proof of Theorem 2
We will prove the theorem by checking assumptions A1A6 of the general theorem in the supplementary materials. We can verify the assumption A1 by applying our Theorem 1 with and the norm. A2 also automatically holds under the model assumption. For the assumption A3, we need to verify both existence of and nonsingularity of the matrix . Following similar arguments as in Wen & Chen (2013) (page 402405), the existence of can be verified, which satisfies that for all ,
Comments
There are no comments yet.