The standard approach to (parametric) regression is to relate covariates to one parameter of specific interest, for example, consider generalized linear models (McCullagh and Nelder, 1989) where covariates enter through the location parameter while the dispersion parameter is simply a constant. We will refer to this standard approach as single parameter regression (SPR). The question we ask is why any one parameter should be deemed the “interest” parameter in terms of covariate analysis? This standard SPR approach neglects the potential influence of covariates on other distributional parameters which may also be important in describing the phenomenon under study. A more flexible approach is to allow covariates to enter the model through multiple distributional parameters, e.g., location and dispersion simultaneously. Hereafter, we refer to this approach as multi-parameter regression (MPR).
Multi-parameter regression modelling has appeared several times in the literature. In the context of normal linear regression,Park (1966) and Harvey (1976)
modelled the dispersion parameter as a function of covariates to address heteroscedasticity, whileSmyth (1989) modelled dispersion in the more general case of generalized linear models. More recently, structured dispersion models have been considered by Lee and Nelder (2001, 2006). The generalized additive models for location, scale and shape (GAMLSS) framework (Rigby and Stasinopoulos, 2005; Stasinopoulos and Rigby, 2007) goes beyond the exponential family to include a large variety of distributions in which covariates may enter through multiple parameters simultaneously; see also http://www.gamlss.org. The MPR approach has also been considered in longitudinal data analysis (joint mean-covariance models) (Pan and MacKenzie, 2003, 2006, 2007)
and in quantile regression(Noufaily and Jones, 2013).
In the setting of survival analysis, fully parametric regression models have been much less popular than Cox’s (1972) semi-parametric proportional hazards (PH) model where the hazard function is given by where and
are vectors of covariates and regression coefficients, respectively, andis a non-parametric function. Due to its wide acceptance of and availability in standard software, the Cox model is often fitted to data where the proportional hazards assumption does not hold (Schemper, 1992). Although the Cox model can be generalized in various ways (McGilchrist and Aisbett, 1991; Gray, 1992; Klein, 1992; Nielsen et al., 1992; Grambsch and Therneau, 1994; Therneau and Grambsch, 2000; Martinussen et al., 2002; Tian et al., 2005), our focus will be on fully parametric multi-parameter regression models.
We note that the basic Cox model is in fact a single parameter regression model since the scale of is modelled via the component whereas its shape has no specific structure, being absorbed entirely by . Although the stratified Cox model is somewhat analogous to a multi-parameter regression model in the sense that covariates enter the shape of , this is limited to categorical covariates. Moreover, since is not modelled directly within the Cox model, we cannot estimate the effects of such stratification variables on the hazard. However, if instead we consider a fully parametric hazard, with parameters controlling its scale and shape, it is straightforward to develop a model where covariates enter through these scale and shape parameters. Indeed, many popular survival distributions have two such parameters, for example, Weibull, log-normal, gamma and log-logistic (Kalbfleisch and Prentice, 2002; Lawless, 2003)
. Within these two-parameter survival distributions, covariates are typically introduced via the scale parameter leading to proportional hazards (PH), accelerated failure time (AFT) or proportional odds (PO) models depending on the distribution. However, further flexibility could clearly be achieved by modelling the scaleand shape parameters (i.e., multi-parameter regression), thereby accommodating a wider variety of survival data.
Multi-parameter regression is not wholly novel in a survival setting; early references include Taulbee (1979) and Gaynor (1987) who used polynomial and Gompertz hazard models respectively. Anderson (1991) considered a log-linear model for the survival time where both the location and dispersion parameters depended on covariates (generalizing the AFT model) and, as an example, applied a Weibull MPR model (albeit different from the one in this paper) to the Framingham Heart Study data; interestingly, this model has been adopted in coronary heart disease literature where it is known as a “Framingham equation” (McEwan et al., 2004). More recently, the MPR approach has been used in the context of the inverse Gaussian model (Lee and Whitmore, 2006; Aalen et al., 2008; Lee and Whitmore, 2010). Furthermore, in their development of semi-parametric maximum likelihood estimation, Zeng and Lin (2007) considered heteroscedastic transformation models for survival data encompassing multi-parameter regression. It is also noteworthy that cure models (Farewell, 1982; Maller and Zhou, 1995; Lambert et al., 2007; De Castro et al., 2010)
fall within the MPR framework, i.e., covariates typically enter via the scale parameter and the cure probability.
Notwithstanding these developments, we note that multi-parameter regression in survival has not previously received much attention as a general procedure and is certainly not in mainstream use. We therefore explore and discuss the consequences of the multi-parameter regression approach using the popular Weibull model and show its flexibility in real data analysis. In particular, the MPR model provides a useful alternative to the proportional hazards assumption and produces a new test of deviation from proportionality. It may be fitted using our new mpr package (Burke, 2016) in R which implements the techniques developed in this paper. See also the gamlss.cens package (Stasinopoulos et al., 2016) which extends the GAMLSS framework to survival data.
The paper is organized as follows. In §2 we introduce and develop the Weibull MPR model. Hypothesis testing and variable selection are discussed, including a simulation study to assess our proposed MPR selection procedure, in §3. An application of the methodology to lung cancer data appears in §4 and, finally, we conclude with some remarks in §5.
2 The Weibull multi-parameter regression model
The Weibull distribution is one of the most popular survival distributions and its hazard function is given by
where , the scale parameter, controls the overall magnitude of the hazard and , the shape parameter, controls its time-evolution; it can increase (), decrease () or remain constant () over time. The Weibull MPR model is generated by introducing covariates into both parameters as follows:
where the log-link is used to ensure positivity of both parameters, and are scale and shape covariate vectors which may or may not contain covariates in common and and are the corresponding regression coefficients. Maximum likelihood estimation of the regression coefficients is straightforward and has been implemented in the mpr package (Burke, 2016).
Under the Weibull MPR model, the hazard ratio for a binary covariate, , common to both scale and shape regression components, is
where and are the scale and shape coefficients of respectively and represents all of the other terms in the shape linear predictor excluding . The hazard ratio is time-dependent where the time-evolution depends on the other shape-covariates via . This dependence on can be dealt with by inserting a “typical” shape covariate profile or by integrating (2.3) over the empirical distribution of in the dataset (cf. Nelder and Lane (1982); Karrison (1987); Shen and Fleming (1997); Martinussen and Pipper (2013)). Most importantly, it is the sign of which solely determines the path of the hazard ratio. When , the ratio of hazards reduces to the usual PH constant, . Thus, the Weibull MPR model directly extends the proportional hazards model, providing a new test of proportionality adjusted for other scale and shape covariates. Furthermore, for , setting (2.3) equal to one and solving yields an explicit solution for the time-point at which crossing hazards occur: .
We note that Anderson (1991) considered a different Weibull MPR model. Anderson’s model is based on an alternative hazard parametrization, , more suited to AFT modelling. One easily finds that it implies a hazard ratio whose functional form is somewhat more complicated than (2.3) and which does not reduce to when . Thus, the model presented here provides a more direct extension to the PH model. However, just as our model produces a PH test for a given covariate, Anderson’s model produces an analogous AFT test – but this was not explored in Anderson (1991). Furthermore, the primary model emphasized by Anderson (1991) is a restricted and asymmetric parametrization in which the shape component is related to the scale component via (although the unrestricted parametrization was also developed and applied further in O’Dell et al. (1994)). Such parametrizations do not permit variable selection in both regression components, a valuable procedure which enables one to determine which covariates have non-PH effects. The identification of such variables is important in practice (see Section 3). Finally, it is worth highlighting that, although both we and Anderson (1991) consider the Weibull distribution as an example, the MPR approach is not limited to any one distribution and, indeed, the accompanying mpr package (Burke, 2016) includes a variety of popular survival distributions.
3 Hypothesis testing and variable selection
We now consider the implications of correlation between estimated regression coefficients within MPR models on hypothesis testing and variable selection strategies. These issues do not appear to have been considered in depth by other authors.
3.1 Hypothesis testing
In standard single-parameter regression (SPR) survival models, covariates only appear in the scale and, therefore, the importance of a particular covariate, , is determined by testing if its regression coefficient, , differs significantly from zero. However, in the MPR model there are two hypotheses of interest, namely: (i) and (ii) . Furthermore, the estimated coefficients for the same covariate in different regression components (i.e., and ) are typically quite correlated (see Appendix A). The consequence of this higher correlation is that the scale effect, , is most heavily adjusted for the shape effect, , and vice versa. Thus, we recommend that covariates should be considered in both regression components simultaneously, since we can imagine scenarios where a covariate only becomes significant when it is present in both components.
It is also of interest to determine the total effect of , i.e., by testing (iii) . Using the asymptotic result , where is the relevant covariance sub-matrix of , we have that
Thus, a p-value can be obtained by setting in (3.4) and comparing this statistic to the distribution. Furthermore, a confidence ellipse for is given by the set of points defining the contour line such that (3.4) is equal to (see Friendly et al. (2013) for a detailed account of confidence ellipses).
3.2 Selection procedure
We now define to be a model with scale and shape covariate vectors and respectively. Hence, let be the model where does not appear (i.e., ). Now, there are three models which include : , and , respectively. Therefore, we can test hypotheses (i), (ii) and (iii) above by comparing with , with and with , respectively. This can be done by means of likelihood ratio tests or information criteria (see Burnham and Anderson (2002)); this approach is more in line with variable selection procedures (Miller, 2002). Of course inferential problems associated with variable selection methods are well documented (Miller, 1984; Hurvich and Tsai, 1990; Zhang, 1992) and automatic selection procedures are much criticized. Nonetheless, such procedures are useful when the number of covariates is large, so that methods for MPR models are required. Accordingly, an algorithm for MPR forward selection is described in Appendix B which also includes a comparison with the gamlss stepwise procedure.
3.3 Simulation study
We now evaluate the performance of a stagewise MPR variable selection procedure which involves starting from the null model, contains both forward and backward (scale, shape and simultaneous) steps and uses information criteria (AIC and BIC) as the basis of selection.
We simulated data from the Weibull MPR model with
is a vector of independent binary variables. Thus, the first four covariates,, , and , affect both the scale and the shape, and affect the shape only, and affect the scale only and, finally, and have no affect on either component. This setup gives a good variety of scale and shape effects of different strengths where the coefficient values were chosen to lead to realistic survival data. Furthermore, we varied the sample size ( 100, 500 and 1000) and censored proportion ( 20%, 50% and 80%) giving 9 scenarios in total, each of which was repeated 500 times.
At the th repetition of a particular scenario, , the selection procedure was applied: let and denote the sets of scale and shape covariates selected in this repetition. Over these 500 simulation replicates, the relative (scale and shape) selection frequencies for the covariate , , are given by
respectively. The relative frequencies, for selection based on AIC and BIC, respectively, in each of the 9 simulation scenarios, are displayed in Fig. 1.
We first consider the results for selection based on AIC (Fig. 1, top panel). It is clearly difficult to determine which covariates affect survival when the information available is low, i.e., . However, even when the sample size is small, the covariates with stronger effects (i.e., those with larger regression coefficients) are still selected 75% - 85% of the time provided that censoring is low (). When a moderate level of information is available, , the covariates with stronger effects are selected 100% of the time while those with weaker effects are selected over 80% of the time. Furthermore, as the sample size increases, the relative frequency of selection of any covariate with nonzero effect approaches one. Although important covariates were identified quite successfully, covariates with no effect were selected approximately 20% of the time across all scenarios using AIC. This can easily be combatted by using BIC (Fig. 1, bottom panel) but it comes at the expense of identifying important covariates less often than when using AIC (particularly in low information scenarios) which is not an unexpected finding.
4 Analysis of lung cancer data
Here we analyse a lung cancer dataset which appeared in a 1995 Queen’s University Belfast PhD thesis by P. Wilkinson and was further analysed by MacKenzie (1996). The data contains individuals diagnosed with lung cancer in Northern Ireland during the period October 1st 1991 to September 30th 1992. Time was measured, in months, from date of diagnosis until the earlier of the occurrence of death or the study end date, which was May 30th 1993. In this observational study there were 855 individuals, of whom 673 died and the remainder were right-censored. A number of (categorical) covariates were measured on each individual, namely: treatment type, age group, WHO status, sex, smoking status, cancer cell type, metastases, and albumen and sodium levels in the blood.
We analyse the data using the Weibull MPR model from the mpr
package. Single factor and multi-factor analyses are presented to illustrate both the capabilities of the MPR approach and the application of MPR variable selection. It is worth noting that a number of the categorical variables have missing values; in these cases, an extra category labelled “Missing” was created. Although this ad-hoc approach retains observations in the analysis, it can lead to bias in the estimated parameters(Hortan and Kleinman, 2007; Jones, 1996; Vach and Blettner, 1991). However, addressing missingness is not our aim here.
4.1 Single factor analysis: treatment
We first investigate the unadjusted effect of treatment which has five levels, namely: palliative care (the reference category), surgery, chemotherapy, radiotherapy and chemotherapy + radiotherapy combined. Two models were fitted to the data: the multi-parameter regression Weibull model, and the standard, but more restrictive, shape-constant single parameter regression model, i.e., the proportional hazards Weibull model. These models (labelled MPR and PH) are summarized in Table 1.
Single factor analysis results: standard errors are in brackets.
In the PH model, the coefficients are negative which means that treatment reduces the hazard (compared to palliative care). All treatments are statistically significant (at the 5% level) and, furthermore, the relative merit of each treatment is easily determined from the magnitude of the coefficients. However, the situation is more complex in the MPR model. The coefficients are also negative in this case but the coefficients are positive. Thus, although the hazard is reduced through treatment, it is increasing relative to palliative care over time. In other words, the effectiveness of each treatment reduces over time. All treatments are statistically significant in both the scale and shape components apart from chemotherapy (neither component). This latter statement is based on considering the scale and shape effects separately; joint effects can be assessed using (3.4). To this end, joint confidence ellipses for the regression coefficients are shown in Fig. 2
along with the individual confidence intervals for comparison. Although the chemotherapy effect is non-significant, its confidence ellipse only just covers the. Furthermore, while zero is well within the confidence interval, the confidence interval only just includes zero. This motivates consideration of a reduced MPR model, fixing . Indeed, we find that this reduced MPR model (not shown) has improved AIC (3894.5) and BIC (3937.2), respectively and, interestingly, we then get , i.e., very close to the PH analysis, while all other coefficients are virtually unchanged from the original MPR model. This latter finding highlights that the MPR analyis permits the inclusion of simpler PH effects (as expected from Section 2), however, for the sake of brevity, we will not consider the reduced model further.
Although the - and -coefficients in the MPR model provide some preliminary insights, the hazard ratios, (2.3), combine information from both of these components which allows us to determine the overall treatment effects. Thus, Fig. 3 shows the hazard ratio for each treatment relative to palliative care along with along with 95% confidence intervals calculated using the delta method. For comparison, the PH hazard ratios and confidence intervals are also shown. Based on the MPR analysis, the only treatment superior to palliative care over the full range of time is surgery. Chemotherapy appears to have a slight early effect in the first 4 months. Radiotherapy and the combined treatment are both superior to palliative care initially, but their effectiveness wears off after about 7 months. Although there appears to be some evidence that the combined treatment may become more hazardous than palliative care later in time, we note that there is much less information in the tails (15 - 20 months) and, indeed, this could also be a consequence of different frailties in the groups. Of course, being an observational study, these are not the true treatment effects since they depend on the net interplay of other factors, for example, many of the individuals selected for surgery were younger, healthier and free from metastases.
Figure 4 shows the Kaplan-Meier survivor curves (Kaplan and Meier, 1958) for each treatment group along with the model-based (MPR and PH) predicted survivor curves calculated using the well-known relationship where and are the survivor and hazard functions respectively. We can see that the MPR model fits the data more closely than the PH model. More formally, this improvement in fit is confirmed by carrying out a likelihood ratio test and the fact that the MPR model has much lower AIC and BIC values. It is noteworthy that the regression coefficients, standard errors and predicted survivor curves arising from the semi-parametric Cox model (not shown) are almost identical to those of the PH Weibull model.
4.2 Multi-factor analysis: variable selection
We now consider a full covariate analysis of the lung cancer data. In order to initially determine the importance of each covariate, first the full model (scale and shape components saturated) was fitted to the data; this model had and AIC respectively. Then, 27 reduced models were fitted which arise through excluding each of the nine covariates from scale, the shape and simultaneously from the scale and shape. These models were compared to the full model using likelihood ratio tests and AIC differences.
|L.R. Test (p-value)|
The results are shown in Table 2. Both the p-values arising from the likelihood ratio tests and the AIC differences agree, i.e., those with smaller p-values have positive values and vice versa. It is interesting to note that although age is significant in the shape component, the overall (joint) effect of age is non-significant. Conversely, smoking status is not significant in either component, when judged individually, but the joint effect is significant. These findings are a consequence of the correlation that exists between scale and shape coefficients (discussed in Section 3 and Appendix A) and, furthermore, the need for considering scale, shape and joint effects is clear. The joint values give us a sense of the relative importance of each covariate and, in particular, WHO status, treatment and the presence of metastases appear to be the most important factors. Cancer cell type, sodium level and albumen level are also significant, but to a lesser degree. Finally, smoking status has a weak effect whereas age and sex have little impact on survival.
|AIC Selection||BIC Selection|
|Note: = “selected in scale” and = “selected in shape”|
Our stagewise selection procedure (based on both AIC and BIC with scale, shape and simultaneous steps) was applied to the full MPR model and, for comparison, the full PH model (which, of course, only had scale steps). The results are summarized in Table 3. Both the PH and MPR models are in agreement on the significant covariates where, in particular, age and sex are not selected; smoking status is also not selected using BIC. Although the PH and MPR models are in agreement on which covariate effects are significant, the nature of the effects will differ under the two models. Specifically, any covariate with an effect in the MPR model is judged to have a non-PH effect. Thus, using AIC, treatment, smoking status, metastases and albumen level are judged to be non-PH whereas, using BIC, only treatment is judged to be non-PH. Thus, there is strong evidence that the effect of treatment is non-PH which is also supported by the large value for its shape effect in Table 2. The adjusted treatment hazard ratios from this multi-factor MPR model (not shown) are quite similar to the unadjusted hazard ratios in Fig. 3 although the effects of surgery and radiotherapy are reduced (i.e., the hazard ratios are closer to one).
Multi-parameter regression produces flexible parametric models which can, among other things, relax the proportional hazards assumption by allowing time-dependent hazard ratios. Furthermore, this is achieved without the need for specialized estimation procedures. Naturally, the process of variable selection is more involved in this multi-component setting due to correlation between estimated regression coefficients – an aspect which does not appear to have been considered in detail by other authors. However, our proposed selection procedure generalizes standard approaches (see Appendix B). Although we have focussed only on the two-parameter Weibull model in this paper, the use of alternative models is advisable in practice. In particular, more general parametric models can facilitate more general functional forms for the hazard ratio. Accordingly, the mpr package (Burke, 2016) includes a variety of distributions. Finally, we believe that it is important for practitioners to diversify beyond standard Cox analyses and we suggest that multi-parameter regression models provide a flexible alternative.
We greatly appreciate the insightful comments made by the referees which allowed us to improve on an earlier draft of this paper. The first author would like to thank the Irish Research Council (www.research.ie) for supporting this work.
- Aalen et al. (2008) Aalen, O. O., Borgan, Ø., and Gjessing, H. K. (2008). Survival and event history: A process point of view. Springer.
- Anderson (1991) Anderson, K. (1991). A nonproportional hazards Weibull accelerated failure time regression model. Biometrics, 47:281–288.
- Burke (2016) Burke, K. (2016). mpr: Multi-Parameter Regression (MPR). R package version 1.0.4 (http://cran.r-project.org/package=mpr).
- Burnham and Anderson (2002) Burnham, K. P. and Anderson, D. R. (2002). Model selection and multi-model inference: A practical information-theoretic approach. Springer, second edition.
- Cox (1972) Cox, D. R. (1972). Regression models and life tables (with discussion). J. R. Statist. Soc. B, 34:187–220.
- De Castro et al. (2010) De Castro, M., Cancho, V., and Rodrigues, J. (2010). A hands-on approach for fitting long-term survival models under the GAMLSS framework. Computer Methods and Programs in Biomedicine, 97:168–177.
- Farewell (1982) Farewell, V. T. (1982). The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 38:1041–1046.
- Friendly et al. (2013) Friendly, M., Monette, G., and Fox, J. (2013). Elliptical insights: Understanding statistical methods through elliptical geometry. Statist. Sci., 29:1–39.
- Gaynor (1987) Gaynor, J. J. (1987). The use of time dependent covariates in modelling data from an occupational cohort study. Appl. Statist., 36:340–351.
- Grambsch and Therneau (1994) Grambsch, P. M. and Therneau, T. M. (1994). Proportional hazards tests and diagnostics based on weighted residuals. Biometrika, 81:515–526.
- Gray (1992) Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J. Am. Statist. Ass., 87:942–951.
- Harvey (1976) Harvey, A. C. (1976). Estimating regression models with multiplicative heteroscedasticity. Econometrica, 44:461–465.
- Hortan and Kleinman (2007) Hortan, N. J. and Kleinman, K. P. (2007). Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am. Statistician, 61:77–90.
- Hurvich and Tsai (1990) Hurvich, C. M. and Tsai, C. L. (1990). The impact of model selection on inference in linear regression. Am. Statistician, 44:214–217.
- Jones (1996) Jones, M. P. (1996). Indicator and stratification methods for missing explanatory variables in multiple linear regression. J. Am. Statist. Ass., 91:222–230.
- Kalbfleisch and Prentice (2002) Kalbfleisch, J. D. and Prentice, R. L. (2002). The statistical analysis of failure time data. Wiley, second edition.
- Kaplan and Meier (1958) Kaplan, E. L. and Meier, P. (1958). Non-parametric estimation from incomplete observations. J. Am. Statist. Ass., 53:457–481.
- Karrison (1987) Karrison, T. (1987). Restricted mean life with adjustment for covariates. J. Am. Statist. Ass., 82:1169–1176.
- Klein (1992) Klein, J. P. (1992). Semiparametric estimation of random effects using the Cox model based on the EM algorithm. Biometrics, 48:795–806.
- Lambert et al. (2007) Lambert, P. C., Thompson, J. R., Weston, C. L., and Dickman, P. W. (2007). Estimating and modeling the cure fraction in population-based cancer survival analysis. Biostatistics, 8:576–594.
- Lawless (2003) Lawless, J. (2003). Statistical models and methods for lifetime data. Wiley, second edition.
- Lee and Whitmore (2006) Lee, M. L. T. and Whitmore, G. A. (2006). Threshold regression for survival analysis: Modeling event times by a stochastic process reaching a boundary. Statist. Sci., 21:501–513.
- Lee and Whitmore (2010) Lee, M. L. T. and Whitmore, G. A. (2010). Proportional hazards and threshold regression: Their theoretical and practical connections. Lifetime Data Analysis, 16:196–214.
- Lee and Nelder (2001) Lee, Y. and Nelder, J. (2001). Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and strucured dispersions. Biometrika, 88:987–1006.
- Lee and Nelder (2006) Lee, Y. and Nelder, J. A. (2006). Double hierarchical generalized linear models. Appl. Statist., 55:139–185.
- MacKenzie (1996) MacKenzie, G. (1996). Regression models for survival data: the generalized time-dependent logistic family. The Statistician, 45:21–34.
- Maller and Zhou (1995) Maller, R. A. and Zhou, S. (1995). Testing for the presence of immune or cured individuals in censored survival data. Biometrics, 4:1197–1205.
- Martinussen and Pipper (2013) Martinussen, T. and Pipper, C. B. (2013). Estimation of odds of concordance based on the aalen additive model. Lifetime Data Analysis, 19:100–116.
- Martinussen et al. (2002) Martinussen, T., Scheike, T. H., Skovgaard, I., and Matinerssen, T. (2002). Efficient estimation of fixed and time-varying covariate effects in multiplicative intensity. Scand. J. Statist., 29:57–74.
- McCullagh and Nelder (1989) McCullagh, P. and Nelder, J. (1989). Generalized linear models. Chapman and Hall, second edition.
- McEwan et al. (2004) McEwan, P., Williams, J., Griffiths, J., Bagust, A., Peters, J., Hopkinson, P., and Currie, C. (2004). Evaluating the performance of the Framingham risk equations in a population with diabetes. Diabetic Medicine, 21:318–323.
- McGilchrist and Aisbett (1991) McGilchrist, C. A. and Aisbett, C. W. (1991). Regression with frailty in survival analysis. Biometrics, 47:461–466.
- Miller (1984) Miller, A. J. (1984). Selection of subsets of regression variables. J. R. Statist. Soc. A, 147:389–425.
- Miller (2002) Miller, A. J. (2002). Subset selection in regression. Chapman and Hall, second edition.
- Nelder and Lane (1982) Nelder, J. A. and Lane, P. W. (1982). Analysis of covariance and standardization as instances of prediction. Biometrics, 38:613–621.
- Nielsen et al. (1992) Nielsen, G. G., Gill, R. D., Andersen, P. K., and Sørensen, T. I. A. (1992). A counting process approach to maximum likelihood estimation in frailty models. Scand. J. Statist., 19:24–43.
Noufaily and Jones (2013)
Noufaily, A. and Jones, M. C. (2013).
Parametric quantile regression based on the generalized gamma distribution.J. R. Statist. Soc. C, 62:723–740.
- O’Dell et al. (1994) O’Dell, P., Anderson, K., and Kannel, W. (1994). New models for predicting cardiovascular events. J. Clinical Epidemiology, 47:583–592.
Pan and MacKenzie (2003)
Pan, J. and MacKenzie, G. (2003).
On modelling mean-covariance structures in longitudinal studies.Biometrika, 90:239–244.
- Pan and MacKenzie (2006) Pan, J. and MacKenzie, G. (2006). Regression models for covariance structures in longitudinal studies. Statistical Modelling, 6:43–57.
Pan and MacKenzie (2007)
Pan, J. and MacKenzie, G. (2007).
Modelling conditional covariance in the linear mixed model.Statistical Modelling, 7:49–71.
- Park (1966) Park, R. E. (1966). Estimation with heteroscedastic error terms. Econometrica, 34:888.
- Rigby and Stasinopoulos (2005) Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape. Appl. Statist., 54:507–554.
- Schemper (1992) Schemper, M. (1992). Cox analysis of survival data with non-proportional hazard functions. The Statistician, 41:455–465.
Shen and Fleming (1997)
Shen, Y. and Fleming, T. R. (1997).
Weighted mean survival test statistics: A class of distance tests for censored survival data.J. R. Statist. Soc. B, 59:269–280.
- Smyth (1989) Smyth, G. K. (1989). Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51:47–60.
- Stasinopoulos and Rigby (2007) Stasinopoulos, D. M. and Rigby, R. A. (2007). Generalized additive models for location scale and shape (GAMLSS) in R. J. Statist. Software, 23:1–46.
Stasinopoulos et al. (2016)
Stasinopoulos, D. M., Rigby, R. A., and Mortan, N. (2016).
gamlss.cens: Fitting an Interval Response Variable Using ‘gamlss.family’ Distributions. R package version 4.3-5 (http://CRAN.R-project.org/package=gamlss.cens).
- Taulbee (1979) Taulbee, J. D. (1979). A general model for the hazard rate with covariables. Biometrics, 35:439–450.
- Therneau and Grambsch (2000) Therneau, T. M. and Grambsch, P. M. (2000). Modeling survival data: Extending the Cox model. Springer.
- Tian et al. (2005) Tian, L., Zucker, D., and Wei, L. J. (2005). On the Cox model with time-varying regression coefficients. J. Am. Statist. Ass., 100:172–183.
- Vach and Blettner (1991) Vach, W. and Blettner, M. (1991). Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. American Journal of Epidemiology, 134:895–907.
- Zeng and Lin (2007) Zeng, D. and Lin, D. (2007). Maximum likelihood estimation in semiparametric regression models with censored data. J. R. Statist. Soc. B, 69:507–564.
- Zhang (1992) Zhang, P. (1992). Inference after variable selection in linear regression models. Biometrika, 79:741–746.
Appendix A: Correlation between estimated regression coefficients
In this simulation study we investigate the correlation structure for estimated regression coefficients. The data were simulated from the Weibull MPR model with
where and are independent binary covariates. We fix regression coefficients and sample size () here as the correlation structure is similar for other values. However, we varied the proportion censored at three different levels, 20%, 50% and 80%, and each of these simulation scenarios was repeated 500 times. At each repetition the Weibull MPR model was fitted to the simulated data giving a matrix of estimated coefficients for each of the three simulation scenarios.
Figure A.1 below shows scatter matrices of the estimated regression coefficients for both models in each of the three scenarios. We have not included the intercepts in these scatter matrices as we are primarily interested in coefficients of covariates. We can see that the “same-covariate-pairs”, and , i.e., pairs corresponding to the same covariate, are highly correlated supporting the discussion in Section 3. All other coefficients are relatively uncorrelated, owing to the fact that and are independent in this simulation.
|20% Censored||50% Censored||80% Censored|
Appendix B: Multi-parameter regression selection algorithm
Using the notation of Section 3, we define to be a model with scale and shape covariates and respectively. Figure B.1 below describes a forward selection procedure starting from the null model . At each iteration of the while loop the algorithm searches for a model which is better than the current model; “better” in this context may be based on likelihood ratio tests, Akaike information criterion, bayesian information criterion, etc. Of course, at each such iteration, there are candidate covariates to be considered for inclusion in the scale , the shape and for simultaneous inclusion . Hence, this main loop is composed of three for loops where all of these new models are fitted and compared to the current model. If, at the end of the while loop, a better model is found, then this becomes the current model and another iteration of the loop commences. Otherwise, the loop ends as the best model has been found. A more general stagewise procedure has been implemented in the mpr package (the stepmpr function) which includes backward steps in addition to forward steps within the main while loop, i.e., there are three additional for loops carrying out backward steps in the scale, shape and simultaneously in the scale and shape.
We note that our selection procedure is more general than that which appears in the gamlss package. Within gamlss there are two strategies: “A” and “B”. Strategy A considers only one parameter at a time as follows: first apply forward selection to the scale, then (given the scale model) apply forward selection to the shape and, finally, return to the scale to apply backward selection. Strategy “B” in gamlss considers simultaneous steps only: apply forward and backward selection simultaneously to both the scale and shape. Our procedure generalizes both of these strategies as, at all points in our selection process, forward and backward steps are applied to both the scale and shape parameters individually and simultaneously.