The proportional hazards model, which was proposed by Cox (1972) is one of the most commonly used models for survival analysis. In a fixed dimensional setting, , the case where the number of covariates is fixed, Andersen and Gill (1982) proved that the maximum partial likelihood estimator for regression parameter has the consistency and the asymptotic normality. Besides, they discussed the asymptotic property of the Breslow estimator for cumulative baseline hazard function.
Recently, many researchers are interested in a high-dimensional and sparse setting for a regression parameter, that is, the case where and the number of nonzero components in the true value is relatively small. In this setting, several kinds of estimation methods have been proposed for various regression-type models. Especially, the penalized methods such as Lasso (Tibshirani (1997), Huang et al. (2013), Bradic et al. (2011) and others) have been well studied. In particular, Huang et al. (2013) derived oracle inequalities of the Lasso estimator for the proportional hazards model, which means the Lasso estimator satisfies the consistency even in a high-dimensional setting. Bradic et al. (2011) considered the general penalized estimators including Lasso, SCAD and others and proved that the estimators satisfies the consistency and the asymptotic normality. On the other hand, the Dantzig selector, which was proposed by Candés and Tao (2007)
for the linear regression model, is also applied to the proportional hazards model byAntoniadis et al. (2010), who dealt with the consistency of the estimator. Fujimori and Nishiyama (2017) extended the consistency results of the Dantzig selector for the model to the consistency for every by a method similar to that of Huang et al. (2013). However, the asymptotic normalities of estimators for high-dimensional regression parameter and the Breslow estimator have not yet been studied up to our knowledge.
In this paper, we will focus on the asymptotic normalities of estimators in a high-dimensional setting. To discuss this problem, we need to consider the dimension reduction of the regression parameter. We will show that the Dantzig selector has variable selection consistency, which enables us to reduce the dimension. Then, we will construct a new maximum partial likelihood estimator by using the variable selection consistency result and show that this estimator has the asymptotic normality. In addition, we will prove that a Breslow type estimator, which is obtained by using the maximum partial likelihood estimator after dimension reduction, satisfies the asymptotic normality.
This paper is organized as follows. The model setup, some regularity conditions and matrix conditions to deal with a high-dimensional and sparse setting are introduced in Section 2. In Section 3, we prove the asymptotic properties of the high-dimensional regression parameter, that is, the variable selection consistency of the Dantzig selector and the asymptotic normality of the maximum partial likelihood estimator after dimension reduction. The asymptotic property of the Breslow type estimator is established in Section 4.
Throughout this paper, we denote by the
norm of vector for every, for we denote:
In addition, for a matrix , where , we define by
where denotes the -component of the matrix . For a vector , and an index set , we write for the -dimensional sub-vector of restricted by the index set , where is the number of elements in the set . Similarly, for a matrix and index sets , we define the sub-matrix by
2.1 Model setup
Let be a survival time and a censoring time of -th individual for every. Assume that each -th individual has an -valued covariate process , and that the survival time is conditionally independent of the censoring time given . Moreover, we assume that ’s never occur simultaneously. For every and , we observe , where and . We define the counting process and for every as follows:
Let be the filtration defined as follows:
Suppose that , are predictable processes. In Cox’s proportional hazards model, it is assumed that each for every has the following intensity:
where is the unknown deterministic baseline hazard function and is the unknown regression parameter. Then, we have that the following process for every is a square integrable martingale:
Note that predictable variation process of is given by:
Hereafter, we write for the cumulative baseline hazard function, ,
The aim of this paper is to estimate the regression parameter and the cumulative baseline hazard in a high-dimensional and sparse setting for , ,
where is the support index set of the true value. To estimate , we use Cox’s -partial likelihood which is given by;
Put . We write for the gradient of and for the Hessian of , ,
Note that is a terminal value of the following square integrable martingale:
2.2 Regularity conditions and matrix conditions
We assume the following conditions.
The true value satisfies that . Moreover, there exists a global constant such that
The covariate processes , , are uniformly bounded, , there exists global constant such that
The baseline hazard function is integrable, ,
For every , there exist deterministic -valued function , valued function and - valued function which satisfy the following conditions:
The functions , , satisfy the following conditions:
For every , the following matrix is nonnegative definite:
For every , it holds that
Note that the condition ensures that Lindeberg’s condition holds. Recalling that is the support index set of the true value , we introduce the following factor for the matrix .
Define the set as follows:
- Compatibility factor:
The matrix factor like this can be seen in many papers which deal with high-dimensional and sparse setting. See, e.g., Bickel et al. (2009), van de Geer and Bühlmann (2009) and Huang et al. (2013) for the details. We assume the following condition for .
The compatibility factor is asymptotically positive, ,
3 The estimator for the regression parameter
3.1 The Dantzig selector for the proportional hazards model
3.2 The consistency of the Dantzig selector
In this subsection, we discuss the consistency of the estimator in the sense of -norm for every . Assume that and satisfy the following conditions:
where are constants. Suppose that the sparsity is fixed constant which does not depend on . Moreover, we define the random sequence by:
Then, we can show that (see Fujimori and Nishiyama (2017)).
3.3 The variable selection consistency of the Dantzig selector
The aim of this subsection is to show that selects non-zero components of correctly. To do this, we define the following estimator for the support index set of the true value :
The estimator similar to can be seen in Fujimori (2017) which consider a linear model of diffusion processes in a high-dimensional and sparse setting. The following theorem states that has a variable selection consistency.
Note that and that the sparsity is assumed to be fixed. We have that
by the bound from Theorem 3.1 . Therefore, it is sufficient to show that the next inequality
For every , it follows from the triangle inequality that
Then, we have that
for sufficiently large , which implies that . On the other hand, for every , we have that
since it holds that . Then, we can see that which implies that . We thus obtain the conclusion.
3.4 The maximum partial likelihood estimator for the regression parameter after dimension reduction
Using the set , we construct a new estimator by the solution to the next equation:
We prove the asymptotic normality of . In this subsection, we assume that the following matrix is positive definite:
The following theorem states that this estimator satisfies consistency.
We have that
It follows from Lemma 3.1 of Andersen and Gill (1982) that the first term of right-hand side is since the sparsity is assume to be fixed. Moreover, we have that
by the definition of . Noting that , we obtain the conclusion by using Slutsky’s theorem.
To show the asymptotic normality of , we need to prove the next lemma.
For every random sequence which satisfies that
as , it holds that
We have for every and that
The right-hand side of this inequality converges to in probability when as . Then, we obtain the conclusion by a similar way to the proof in Andersen and Gill (1982).
Then, we can prove the asymptotic normality in the following sense by a similar way to that in Andersen and Gill (1982).
4 The estimator for the cumulative baseline hazard function
We define the estimator for by the following Breslow type estimator:
where is defined by the equation (2). We discuss the asymptotic property of in this section. For every , we have that
The third term is asymptotically negligible because it follows from Assumption 2.1 that
Moreover, we have that equals to the following process :
which is a square integrable martingale. Using the Taylor expansion, we have that
and lies between and . Since it holds that by Theorem 3.3, we can see that
Acknowledgements. The author would like to express the appreciation to Prof. Y. Nishiyama of Waseda University and Dr. K. Tsukuda of the University of Tokyo for long hours discussion about this work.
- Andersen and Gill (1982) Andersen, P.K. and Gill, R.D. (1982). Cox’s regression model for counting processes: a large sample study. Ann. Statist. 10, no. 4, 1100-1120.
- Antoniadis et al. (2010) Antoniadis, A., Fryzlewicz, P. and Letué, F. (2010). The Dantzig selector in Cox’s proportional hazards model. Scand. J. Stat. 37, no. 4, 531-552.
- Bickel et al. (2009) Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 (2009), no. 4, 1705-1732.
- Bradic et al. (2011) Bradic, J. Fan, J. and Jiang, J. (2011). Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann. Statist. 39, no. 6, 3092-3120.
- Candés and Tao (2007) Candés, E. and Tao, T. (2007). The Dantzig selector: statistical estimation when is much larger than . Ann. Statist. 35, no. 6, 2313-2351.
- Cox (1972) Cox, D.R. (1972). Regression models and life tables (with discussion). J. Roy. Statist. Soc. Ser B 34 187-220.
- Fujimori and Nishiyama (2017) Fujimori, K. and Nishiyama, Y. (2017). The consistency of the Dantzig selector for Cox’s proportional hazards model. J. Statist. Plann. Inference 181, 62-70.
- Fujimori (2017) Fujimori, K. (2017). The Dantzig selector for a linear model of diffusion processes. arXiv:1709.00710
- Huang et al. (2013) Huang, J., Sun, T., Ying, Z., Yu, Y. and Zhang, C-H. (2013). Oracle inequalities for the LASSO in the Cox model. Ann. Statist. 41, no. 3, 1142-1165.
- Tibshirani (1997) Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Stat. Med. 16 385-395.
- van de Geer (1995) van de Geer, S. (1995). Exponential inequalities for martingales, with application to maximum likelihood estimation for counting processes. Ann. Statist. 23, no. 5, 1779-1801.]
- van de Geer and Bühlmann (2009) van de Geer, S.A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3, 1360-1392.