1 Introduction
The Weibull model weibull1951 is a generalization of the exponential model kondo1930 ; weibull1939 through the inclusion of a positive shape parameter. In reliability or survival analysis, this venture allows the hazard function to change over time, as opposed to the constant hazard function from the exponential model. The Weibull model is still popular in many applied sciences, including reliability and survival analysis ataee2018 ; erango2019 ; brandl2020influence ; chilikov2020 ; marin2005 , wind speed modeling shu2015 ; de2019 ; galarza2019assessment and quality engineering tsiamyrtzis2006 ; colosimo2008
, to mention a few. However, this flexibility should also propel the cautious estimation of the shape parameter. An estimation method should be based on clear and transparent principles that are easily communicated
gelman2017 . This ensures that the analyst conducts the modeling in a principled way and can incorporate useful prior information. Especially in reliability analysis or survival data, objectivity is improbable since the analyst will have some inherent subjectivity based on the subjects’ nature and their generated data, being it a mechanical system or the physiological process of some disease. This prior information (or lack thereof) should be used to establish a principled estimation method for the shape parameter, which is what we propose here.The shape of the hazard function in a survival or reliability study describes the instantaneous risk of the event over time. If the hazard is constant over time, i.e. the exponential model is appropriate, then the hazard is equal at any point in time. However, if the hazard function is not constant then certain practical consequences will follow as a result.
For example, in a study on aggressive cancer, the time until relapse might be of interest. If the hazard function is constant then the followup times could be equally spaced as to minimize the time and cost commitments. If the hazard function is not constant, then followup times should either be condensed (increasing hazard) or prolonged (decreasing hazard), to be economically efficient.
The shape of the hazard function thus plays a pivotal role in the use of the model. We would thus aim to estimate the shape based on principles, and with a good understanding of the estimation process. In the case of constant hazard, we want to be able to recover the exponential model from the Weibull model. However, most current priors do not encourage the shrinkage to the exponential model or provide a mechanism to quantify the a priori belief about the strength of this shrinkage.
Since the work of soland1969
, various priors for the shape parameter has been proposed. Traditionally, assigning a prior to the shape parameter has been done through defining or deriving a distribution for the shape parameter itself. Often, an interpretable function of a hyperparameter has been proven to be a more natural parameter to estimate, and incorporate prior information for. A
parameterization of the shape parameter could be useful with a mode at zero (which implies the exponential model), although the contraction towards zero would be hard to quantify. Another issue is the (a)symmetry of the prior around the exponential model. Do we want (a)symmetric priors for the shape parameter, why and in what sense should it be asymmetric?Improper or noninformative priors have been popular for some time, but the need for weakly informative priors has been realised lemoine2019 . The improper uniform and the gamma priors have been used in various studies for the shape of the Weibull model, see abrams1996 , erango2019 , ataee2018 , marin2005 , chaturvedi2014 and gupta2017 amongst others. A joint Jeffrey’s prior proposed by guure2012 ; gupta2017 is also a popular choice amongst practitioners aslam2014 ; gupta2017 . We will show in this paper that the use of these priors, without strong prior information, could lead to unreliable results and thus misspecified models. This type of result is not surprising and has been presented by simpson2017 and klein2016 , for different model setups. Our aim in this paper, is to propose a prior that provides reliable and predictable results in the Bayesian inference of the Weibull model and can quantify the contraction to the exponential model. Invariance under reparameterizations of the shape parameter would be beneficial, as is not the case for most popular priors.
We propose a prior that is based on a distance metric in a principled and transparent way, from an information theoretic perspective. We use the work of simpson2017 to derive the penalized complexity (PC) prior for the shape parameter of the Weibull distribution. Even though our results are valid for all Weibull models, we provide the details for survival data. In Section 2 the Bayesian Weibull regression (survival) model is introduced and some of its properties are discussed. Then in Section 3, we derive and define the PC prior for the shape parameter and in Section 4 discuss the effect of some popular priors on the model estimation, and posit the need for the consideration of the PC prior as a reliable prior. We conclude the paper with an application of the PC prior in a joint survivallongitudinal model setting in Section 5, and a discussion in Section 6.
2 Bayesian Weibull regression model for survival data
We focus our attention on the Weibull model in the context of survival data of the remainder of this paper. There are two parameterizations for the density function of a Weibull distributed random variable
, with shape and scale parameters and , respectively, as follows,(1) 
and
(2) 
The parameterization in (1) is still popular in practice but we will show that for orthogonal interpretation of the parameters, the parameterization in (2) is essential.
In survival analysis, the hazard function, is a useful measure to understand the instantaneous risk of the event. For the Weibull model in (1) the hazard function is
where , and for (2)
The effect of the value of is illustrated in Figure 1 for a constant scale , without loss of generality. In this case the two parameterizations (1) and (2) lead to identical hazard functions. The flexibility of the Weibull model imposed by is clear from Figure 1. We note that decreasing, constant (exponential model) and increasing hazard functions are possible with this model. The type of hazard is determined by the value of and care should thus be taken in the estimation of .
2.1 Estimation of parameters
In the presence of covariates, covariate information is incorporated in and then we define the scale parameter in (1) or (2) as . A prior distribution is then explicitly formulated for instead of . Suppose the joint prior for and is represented by .
The likelihood function for a sample of individuals with noncensoring indicators (i.e. if is a complete observation) and covariates , is
(3) 
where is from (1) or (2) and .
Censored observations lead to the inclusion of the survival function in the likelihood function to account for incomplete observations, which is unique to the survival Weibull model, as opposed to a Weibull model for other strictly positive responses where all observations are observed completely.
The prior and the likelihood information is then combined to form the joint posterior,
, after which Bayesian inference is possible. Often, also here, the explicit form of the posterior density is not analytically tractable and scientists revert to simulationbased methods for Bayesian inference, like Markov chain Monte Carlo (MCMC) methods
brooks2011 or approximate methods like integrated nested Laplace approximations (INLA) rue2009 .2.2 Popular prior choices
There is no joint conjugate prior for
(or ) if all parameters are assumed to be unknown. It is common to assign independent priors to and banerjee2008 ; chaturvedi2014 ; ataee2018 ; gupta2017 , respectively or use the joint Jeffrey’s prior guure2012 ; gupta2017 .In our current work, we consider the Gaussian prior for and mainly focus on the prior for the shape parameter, . The two most common priors of are thus:

Improper uniform prior.
The improper uniform prior translates into a constant density for all values of , i.e.(4) 
Gamma prior.
A gamma prior with equal parameters is assigned to i.e. such that the prior density function of is given by(5)
The resulting priors (4) and (5) for some values, are illustrated in Figure 2. Even though all the gamma priors have a priori , it is clear from Figure 2 that the mode of the priors is not at , or in the vicinity. Actually, the density at is small for most values, which results in a prior belief that the base model is very unlikely, even though the prior expected value indicates the base model. As a default prior choice, this is undesirable. Only if expert knowledge justifies the gamma prior to exhibit some particular behaviour of the parameter, the gamma prior might be appropriate.
This leaves the user with the challenge to formulate a default prior for which will contract to the base model value at a quantifiable rate, since neither the improper or gamma priors satisfies this desideratum. It is thus our aim in this paper to propose a principled prior based on the distance to the base model rather than just assigning certain prior density values to arbitrary values. This is achieved in Section 3, particularly in Section 3.3.
3 Penalized complexity prior
Penalized complexity priors (PC priors) as defined by simpson2017 have been shown to be principled and sensible prior choices for hyperparameters where the definition of a base model is natural. Currently, the RINLA software available in the INLA library in R is the only computational tool that contains the PC priors as prior options for various modeling elements. In this section we firstly present an overview of the methodology that underpins the PC priors and then we develop the PC prior for the shape of the Weibull model.
3.1 Methodology
The PC prior is derived from the KullbackLeibler divergence (KLD,
kullback1951 ) between the proposed (complex) model and a natural base model. In the context of the Weibull model, the natural base model is the exponential model, where in (1) and (2).The KLD between two density functions and for a random variable , is defined as
(6) 
where is the support of . The KLD is a measure of the information lost when choosing over . Now in our case, let be the complex model and be the base (exponential) model, i.e. , then the KLD for either parameterization is defined as
(7) 
for from (1) or (2). Then define the distance function
This distance function is a measure of the complexity of the more complex model with respect to the exponential model, in the sense that it is a unidirectional distance between the complex model and the exponential model . The PC prior is then constructed through and hence,
where is a userdefined hyperparameter such that where is small. This statement incorporates the information of the tail behaviour of the prior, based on the information from the user.
3.2 PC prior for both parameterizations
In this section we derive the KLD and subsequently the PC prior for for both parameterizations (1) and (2). Suppose that the information from covariates are incorporated in for the complex model and in for the base model as described in Section 2.
3.2.1 Parameterization 1
From (1) the KLD for the first parameterization is
where is Euler’s constant (see euler1735 ). Furthermore, suppose (which is a realistic assumption since the covariates will be the same for an individual no matter the model) then
(8) 
It is clear from (8) that the KLD for is dependent on the value of . This is undesirable and impractical and we will thus not use this parameterization.
3.2.2 Parameterization 2
The KLD for the second parameterization from (2) is
where is Euler’s constant (see euler1735 ). Now, again suppose that , then
(9) 
Now, from (9) we can see that the PC prior for is independent of the value for . This result advocates the use of the second parameterization as the Weibull model.
Subsequently the PC prior is derived from the KLD as
(10) 
where is such that , i.e. . The choice of gives the user control over the tail behaviour of the prior and hence the rate of contraction towards the base model. We show the influence of in the next section.
3.3 PC prior for the shape parameter of the Weibull model
As discussed in the previous subsection we will only consider parameterization 2 of the Weibull model as in (2) for the remainder of this paper. From (9) and (10) the PC prior is defined as
(11)  
where is the digamma function abramowitz1988 .
In Figure 3 we present the PC priors on the distance scale for different values of . Similarly we present the PC priors on the scale in Figure 4.
The exponential prior for the distance to the base model is clear from Figure 3. We observe that smaller values of have weaker contraction to the base model as a result of the heavier exponential tails.
The PC priors are shown in Figure 4 on the original scale. This figure has some interesting features. It is clear that the modal density is at the base model for all , but not for . This is contradictory to what we expect in the PC prior in the sense that the modal density on the original scale might not be at , although the PC prior still has its mode at a distance of zero. Some of the resulting shapes of the PC prior on the scale is quite unexpected and shows how important it is to define a prior on a meaningful invariant scale, based on principles and transparency. Surprisingly, a mode at is not necessary for a good principled prior.
Also, as showed by simpson2017 , the PC prior is invariant under transformations of the shape parameter. This invariance is necessary for proper inference with consistent results amongst reparameterizations of .
In the next section we investigate some popular priors and illustrate their differences to the PC prior (11).
4 The PC prior as a principled prior for
In this section we advocate the use of the proposed PC prior as a default prior when the user does not have strong enough prior information to formulate another prior. In Sections 1 and 2.2 we hinted to the challenges presented by some of the popular prior options and now we present these challenges from the perspective of the distance to the base model, i.e. the exponential model.
4.1 Performance of the priors on the distance scale
To properly understand the behaviour of the other priors we need to reformulate these priors; from a prior in terms of to a prior in terms of the distance between the proposed Weibull model that stems from the chosen prior, to the exponential base model where . To this end we will again use the distance measure from (9). We present the gamma prior here for brevity but the same can be done for the other priors.
We need to reparameterize the gamma prior (5), , in terms of the distance, to find the corresponding prior densities based on the distance scale. Analytically, this endeavour is timeconsuming and of little value so we investigate this numerically. Some distance values to the base model, their corresponding values and densities from the gamma prior (5) are presented in Tables 1 and 2.
Distance  value and  value and 

From Table 1 we can see that for each positive distance value, there are two different density values, each one corresponding to a value of and , respectively. These two values of are not equidistant. This implies that the Gamma prior results in different contractions to the base model for equidistant values of and . This discriminating behaviour is again evident in Table 2 for the Gamma prior. This prior is still commonly used in the literature chaturvedi2014 ; gupta2017 maybe because it seems to be quite uninformative from Figure 2, but in Table 2 we can see clearly that this is not a good option without proper justification since the density at the base model (distance of zero) is very close to zero, and the rate of the discrimination will be very hard to motivate.
Distance  value and  value and 

We present these results graphically in Figure 5. The modal a priori density is not at the base model but some other ”random” configuration. The discriminating behaviour of the Gamma prior for values smaller or larger than as shown in Tables 1 and 2, is very clear from Figure 5. We envision that it would be very hard to properly motivate such a discrimination in terms of the distance to the base model and this shows clearly why the Gamma prior should not be used without prior justification, even for .
4.2 Computational considerations for the PC prior
The Integrated Nested Laplace Approximation (INLA) method was developed by rue2009 as a computational tool for the (approximate) Bayesian inference of latent Gaussian models. For more details on latent Gaussian models and INLA in the context of survival models, see rue2005 and martino2011 amongst others.
The proposed PC prior (11) is currently implemented in the INLA library in R through the pc.alphaw option for the prior of the hyperparameter
. Functions to evaluate, sample, compute quantiles and percentiles of this prior is also included in the current version of the
INLA library.Currently, the default prior for the Weibull model is the PC prior with a parameter value . Various other Bayesian software can be used for inference using the PC prior (11).
5 Application
In this section we apply the proposed PC prior in the context of a survival model. In the preceding sections we motivated and proposed the PC prior by only considering classical survival models. This restriction is not necessary. The PC prior can be applied as a prior for the hyperparameter of the Weibull survival model in the context of competing risks models, joint survivallongitudinal models, multistate models, spatial survival models and many more. Here we show a joint survivallongitudinal application purely for illustration, as we could have simply presented various other models.
In prostate cancer studies, Prostatespecific Antigen (PSA) has been
identified as a biomarker, measured at multiple timepoints, for the status of prostate cancer. High
levels of PSA are indicative of increased risk of prostate cancer or
recurrence. Radiation therapy is a common course of treatment often
prescribed for patients with prostate cancer. If successful, the PSA
levels are expected to drop and remain at a low level. If not, then PSA levels will still drop initially but then rise again
zagars1995 . The estimation of the longitudinal trajectory of PSA is further complicated by the dropout process whereby patients’ are no longer part of the study, either due to salvage hormone therapy or recurrence of the cancer. This is known as informative dropout, which in turn implies informative censoring of the data. If this informative dropout is unaccounted for, it can lead to
considerable bias in the PSA trajectory estimation.
The objective of
this application is thus to identify the trajectory of postradiation
PSA change, while correctly accounting for the informative dropout. To achieve this, a joint longitudinalsurvival model is defined with the PSA levels as the longitudinal process and the time to informative dropout as the survival process with the logarithm of the base PSA value as a linear covariate, .
More details about the clinical impact of such a study can be found in
proust2009 .
The joint model under consideration in this application is of the form:
where , and are the longitudinal and dropout linear predictors, respectively. We assume a Weibull hazard function for the dropout process. The exponential case with constant hazard can be achieved as a special case when . The linear predictors, and , are formulated as:
(12) 
where
and is a
Bayesian smoothing spline with hyperparameter , that captures the nonlinear temporal trend in the PSA level lindgren2008 .
The priors for and are Gaussian, and penalizing complexity priors as presented in simpson2017
are assigned to the precision/variance hyperparameters
, , , , , ,and the longitudinal model error standard deviation
. Also, we now assume the PC prior for the shape parameter , proposed in this paper (11) (currently the default prior in RINLA for the Weibull model). We also include a sensitivity analysis using different values of .5.1 Results
The resulting estimated joint model is presented, with the PC prior with (the default value in RINLA). The results are summarized in Table 3.
Parameter  Posterior Mode  Posterior SD 

It is quite clear from Table 3 that the hazard of
informative dropout is correlated with the longitudinal PSA biomarker
since with credible interval . This
result confirms that the joint model approach is supported by the data
and should be preferred to the separate models. The structure of the
association term as in (12) is quite restrictive but has been
used extensively. We can assume various other association structures and do the inference using RINLA, but this is outside the scope of the current paper.
The posterior and prior densities for is presented in Figure 6.
5.2 Sensitivity analysis
In this section we redo the inference of Section 5.1 for . Note that the value of influences the tail behaviour of the PC prior. As mentioned previously, a lower value of
implies higher a priori probability for values away from
, as in Figures 3 and 4.We omit the tables similar to Table 3 since the results are very similar. In Figure 7 the marginal posterior density of for the different values are presented, and the robustness of the PC prior is clear.
6 Discussion
Prior elicitation is one of the crucial elements in Bayesian analysis, and influences the inference substantially, especially if the data is not very informative. The proper elicitation of prior information, however, is often hard or not possible. In this case, popular or uninformative priors are reverted to since they have either been used extensively in the literature, or they are seemingly weakly/non informative. Here we did not debate informativeness or uninformativeness of priors, but rather the reliability, robustness and transparency of the prior.
In this paper we viewed the Weibull model as a generalization of the exponential model through the shape parameter, and as such the Weibull model is the complex counterpart of the exponential base model. This naturally leads to the provocation of a prior based on the distance from the more complex Weibull model to the base model, and thus the contraction to the base model. The penalizing complexity prior framework as proposed in simpson2017 forms the basis of this venture.
We derived the PC prior for the shape parameter of the Weibull model and investigated its performance with respect to the a priori belief about the applicability of the base model. Some popular priors were also investigated, and some surprising features were discovered, including their inability to contract to the base model for most hyperparameter values were illustrated. From these results we posit that the new PC prior is a reliable, transparent, principled and robust prior, even when prior information is weak or scarce.
An application to data from a prostate cancer study invoking the proposed PC prior is presented (with a sensitivity study) to illustrate the use of this prior in various models (simple and more complicated), like the joint survivallongitudinal model.
This paper provides the user with a principled prior for the Weibull model that behaves in a predictable and reliable way, also in terms of its contraction to the exponential base model, even with little or no prior information. It was shown to be robust with regards to the hyperparameter specification and we advocate the use of the PC prior in forthcoming applications of the Weibull model.
References
 (1) Abramowitz, M., Stegun, I.A., Romer, R.H.: Handbook of mathematical functions with formulas, graphs, and mathematical tables (1988)
 (2) Abrams, K., Ashby, D., Errington, D.: A bayesian approach to weibull survival models—application to a cancer clinical trial. Lifetime Data Analysis 2(2), 159–174 (1996)
 (3) de Araújo, P.H.M., da Nóbrega Marinho, M.H.: Weibull parameters estimation methods for analysis of hydrowind complementarity in northeast brazil. IEEE Latin America Transactions 17(4), 556–563 (2019)
 (4) Aslam, M., Kazmi, S.M.A., Ahmad, I., Shah, S.H.: Bayesian estimation for parameters of the weibull distribution. Sci. Int (Lahore) 26(5), 1915–1920 (2014)
 (5) Ataee, P., Rahgozar, M., Bakhshi, E., Shahrokhi, A.: Application of parametric and nonparametric copula marginal models in recurrent failure times of childhood seizures: Bayesian approach. risk 10(1) (2018)
 (6) Banerjee, A., Kundu, D.: Inference based on typeii hybrid censored data from a weibull distribution. IEEE Transactions on reliability 57(2), 369–378 (2008)
 (7) Brandl, S., Paul, C., Knoke, T., Falk, W.: The influence of climate and management on survival probability for germany’s most important tree species. Forest Ecology and Management 458, 117,652 (2020)
 (8) Brooks, S., Gelman, A., Jones, G., Meng, X.L.: Handbook of markov chain monte carlo. CRC press (2011)
 (9) Chaturvedi, A., Pati, M., Tomer, S.K.: Robust bayesian analysis of weibull failure model. Metron 72(1), 77–95 (2014)
 (10) Chilikov, A., Wainstock, T., Sheiner, E., Pariente, G.: Perinatal outcomes and longterm offspring cardiovascular morbidity of women with congenital heart disease. European Journal of Obstetrics & Gynecology and Reproductive Biology (2020)
 (11) Colosimo, B., Ruggeri, Kenett, Faltin: Bayesian control charts. encyclopedia of statistics in quality and reliability (2008)

(12)
Erango, M.A., Gergiso, K.T., Hebo, S.H.: Survival time analysis of hypertension patients using parametric models.
Advances in Research pp. 1–10 (2019)  (13) Euler, L.: De Progressionibus harmonicus observationes (1735)
 (14) Galarza, J., Condezo, D., Camayo, B., Mucha, E., et al.: Assessment of wind power density based on weibull distribution in region of junin, peru. Energy and Power Engineering 12(01), 16 (2019)
 (15) Gelman, A., Hennig, C.: Beyond subjective and objective in statistics. Journal of the Royal Statistical Society: Series A (Statistics in Society) 180(4), 967–1033 (2017)

(16)
Gupta, P.K., Singh, A.K.: Classical and bayesian estimation of weibull distribution in presence of outliers.
Cogent Mathematics 4(1), 1300,975 (2017)  (17) Guure, C.B., Ibrahim, N.A.: Bayesian analysis of the survival function and failure rate of weibull distribution with censored data. Mathematical problems in engineering 2012 (2012)
 (18) Klein, N., Kneib, T., et al.: Scaledependent priors for variance parameters in structured additive distributional regression. Bayesian Analysis 11(4), 1071–1106 (2016)
 (19) Kondo, T.: A theory of the sampling distribution of standard deviations. Biometrika pp. 36–64 (1930)
 (20) Kullback, S., Leibler, R.A.: On information and sufficiency. The annals of mathematical statistics 22(1), 79–86 (1951)
 (21) Lemoine, N.P.: Moving beyond noninformative priors: why and how to choose weakly informative priors in bayesian analyses. Oikos 128(7), 912–928 (2019)
 (22) Lindgren, F., Rue, H.: On the secondorder model for irregular locations. Scandinavian Journal of Statistics 35(4), 691–700 (2008)
 (23) Marín, J.M., RodriguezBernal, M., Wiper, M.P.: Using weibull mixture distributions to model heterogeneous survival data. Communications in StatisticsSimulation and Computation 34(3), 673–684 (2005)
 (24) Martino, S., Akerkar, R., Rue, H.: Approximate Bayesian inference for survival models. Scandinavian Journal of Statistics 38(3), 514–528 (2011)
 (25) ProustLima, C., Taylor, J.M.: Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach. Biostatistics 10(3), 535–549 (2009)
 (26) Rue, H., Held, L.: Gaussian Markov random fields: theory and applications. CRC press (2005)
 (27) Rue, H., Martino, S., Chopin, N.: Approximate Bayesian inference for latent Gaussian models by using integrated nested laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(2), 319–392 (2009)
 (28) Shu, Z., Li, Q., Chan, P.: Statistical analysis of wind characteristics and wind energy potential in hong kong. Energy Conversion and Management 101, 644–657 (2015)
 (29) Simpson, D., Rue, H., Riebler, A., Martins, T.G., Sørbye, S.H., et al.: Penalising model component complexity: A principled, practical approach to constructing priors. Statistical Science 32(1), 1–28 (2017)
 (30) Soland, R.M.: Bayesian analysis of the weibull process with unknown scale and shape parameters. IEEE Transactions on Reliability 18(4), 181–184 (1969)
 (31) Tsiamyrtzis, P., Hawkins, D.M.: A bayesian approach to statistical process control. In: Bayesian process monitoring, control and optimization, pp. 99–120. Chapman and Hall/CRC (2006)
 (32) Weibull, W.: A statistical theory of strength of materials. IVBHandl. (1939)
 (33) Weibull, W.: Wide applicability. Journal of applied mechanics 103(730), 293–297 (1951)
 (34) Zagars, G.K., Pollack, A., Kavadi, V.S., von Eschenbach, A.C.: Prostatespecific antigen and radiation therapy for clinically localized prostate cancer. International Journal of Radiation Oncology, Biology, Physics 32(2), 293–306 (1995)