1. Introduction
Phase–type distributions are defined as absorption times of purejump Markov processes on a finite state space. Many classical distributions have these representations, such as the exponential, Erlang, Coxian, hyperexponential, and any finite mixture of them. In the works or Marcel F. Neuts and coauthors, cf. for instance [neuts75, neuts1981matrix], the modern framework of phase–type as matrix distributions was systematically studied in a unified framework. Perhaps one of the most attractive features of phase–type distributions from a statistical perspective is that they dense on the distributions on the positive real line in the sense of weak convergence, cf. [asmussen2008applied]
. However, for the latter convergence to manifest itself for a given histogram, often a large number of states – or phases – are required, making the estimation procedure a complex one. Several methods of estimation have been proposed in the literature, including methodofmoments in
[bobbio2005matching, horvath2007matching]and references therein, Bayesian inference in
[bladt2003estimation], and maximum likelihood estimation via the expectationmaximisation (EM) algorithm in [asmussen1996fitting] and for the rightcensored case in [olsson1996estimation]. The latter method considers the full likelihood arising from the path representation of a phase–type distribution and has been the most popular approach for various applications in applied probability.
Recently, a resurgence of statistical modelling using absorption times of stochastic processes has been motivated by the main drawback of phase–type distributions: they all have exponentially decaying tails. In [bladt2017fitting] scale mixtures of phase–type were considered for fitting heavytailed distributions. In [mfph, mml, mmml], a time transform consisting of a deterministic scaling and a random stable factor was considered, leading to heavytailed distributions that enjoy the Markovian calculus of phase–type distributions but in a fractional derivative framework. Finally, in [albrecher2019inhomogeneous], general deterministic transforms were considered, and an extension which includes the multivariate case was given in [albrecher2020iphfit]. The latter distributions were coined inhomogeneous phase–type distributions (IPH) since they have the interpretation of being absorption times of inhomogeneous Markov processes where time evolves according to an underlying deterministic clock.
The first exploration into survival analysis with phase–type distributions can be found in [aalen1995phase]
, although covariates were not considered. Instead, the focus was on the hazard function, which was shown to be very flexible. Indeed, it follows from the denseness property that it can approximate any hazard shape arbitrarily closely. The focus was also on the identification of a phase–type survival model, that is, associating the phases and phase transitions as physical states that an individual goes through before deceasing. A generalisation of that rationale, and which makes phase–type distributions not only practical but also physically wellmotivated is to think in terms of block transitions. We may model an individual’s lifetime and evolution through a fixed number of states before death by associating blocks of phases to each one of these states and considering the phases within each block as unobservable states. This philosophy extends to statistical modelling of lifetimes using IPH distributions and is the primary motivation for their consideration in this work.
In the life sciences, particular instances of phase–type distributions, namely Coxian phase–type distributions, were considered as survival regression models which fall into the accelerated failure time (AFT) specification, cf. [mcgrory2009fully, rizk2019alternative, tang2012modeling]
and references therein. Their popularity for several decades is the product of their natural interpretation in terms of states and their statistical efficacy. Markov Chain Monte Carlo (MCMC) methods seem to be particularly popular for this type of application, but we will not follow this estimation path.
In this paper, we propose survival models and survival regression models based on general IPH distributions. Under our construction, many classical methods are special cases, such as the proportional hazards model for any parametric family, the AFT model, and the Coxian regression. But the combination of inhomogeneous time transforming and considering general subintensity matrices makes our models much more flexible both in terms of tails and hazard function shapes. When considering regressing on the inhomogeneity parameters, matrix generalisations of models such as the one of Hsieh, cf. [hsieh2000sample], are available. We follow the footsteps of [albrecher2020iphfit, asmussen1996fitting, olsson1996estimation] to obtain estimation procedures based on the EM algorithm. Throughout, we also give relevant parametric examples of inhomogeneity functions (equivalently, the intensity function) which can be used in practice for a wide range of applications and generalise classical specifications.
The structure of the remainder of the paper is as follows. In Section 2, we revisit existing necessary background on IPH distributions with an emphasis on the intensity function, which plays a central role in subsequent sections. We also work out some new examples relevant to the survival analysis framework and give a survival curve fit to COVID19 positive patients in Mexico. Section 3 deals with the specification of the proportional intensities model, derivation of its main properties and asymptotics, and the proposal of a modified EM algorithm which is subsequently shown to increase the likelihood at each iteration. A simulated example illustrates the flexibility that our method enjoys. The goodness of fit analysis and some important extensions are provided as well. Section 4
applies the AFT model to IPH distributed random variables and, together with previous models, illustrates how it is possible to seamlessly alleviate badly fitting survival regression models by simply considering their matrix version. In Section
5 we outline more specialized extensions of our models, and finally conclude in Section 6.2. Inhomogeneous phase–type distributions
This section revisits some of the theory of inhomogeneous phase–type distributions, and on the way, provides new examples. We do not aim to be exhaustive, and we give special emphasis on the intensity function, which will later play an important role when considering covariates. In particular, we do not expand on the mathematical representations of the general inhomogeneous intensity matrix in terms of product integrals, nor on the form of the transition matrix of the underlying inhomogeneous Markov process. For the interested reader, we refer to the seminal paper [albrecher2019inhomogeneous].
2.1. Basic properties
Let denote a time–inhomogeneous Markov jump process on a state space , where states are transient and state is absorbing. In other words, has an intensity matrix of the form
where is a matrix and is a
–dimensional column vector. Here, for any time
, , where is the –dimensional column vector of ones. Let , , be a probability vector corresponding to the initial distribution of the jump process, and assume that . Then we say that the time until absorptionhas an inhomogeneous phase–type distribution with representation and we write This setting, however, is too general for applications. Hence, we will consider the following simplified version of the intensity matrix , where is some known non–negative real function and is a sub–intensity matrix. In this case we simply write
Note that for one returns to the time–homogeneous case, which corresponds to the conventional phase–type distributions with notation ; a comprehensive and modern account of phase–type distributions can be found in Bladt & Nielsen [Bladt2017]. It is straightforward to see that if , then there exists a function such that
(2.1) 
where . Specifically, is defined in terms of by
or, equivalently,
Two recurrent quantities that will be useful are the density and survival function of , which are given by
For further reading on inhomogeneous phase–type distributions and motivations for their use in modelling we refer to Albrecher & Bladt [albrecher2019inhomogeneous]. See also [albrecher2020iphfit] for extensions to the multivariate case and [mfph, mml, mmml] for inhomogeneous phase–type distributions with stochastic clocks.
The tail behavior of IPH distributions can be derived in a straightforward manner from the corresponding tail behavior of a PH distribution. Thus, we first recall an explicit expression for the survival function of a PH distribution. The survival function of a phase–type distribution has the following analytic expression:
(2.2) 
where
are the eigenvalues of the Jordan blocks
of , with corresponding dimensions , , and and are constants depending on and . If is the largest real eigenvalue of and is the dimension of the Jordan block of . Then, it is easy to see from (2.2) that(2.3) 
where is a positive constant. That is, all phase–type distributions have exponential tails with Erlanglike second order bias term.
As illustrated in [albrecher2019inhomogeneous], a number of IPH distributions can be expressed as classical distributions with matrixvalued parameters. For the representation of such distributions we make use of functional calculus. If is an analytic function and is a matrix, define
where is a simple path enclosing the eigenvalues of ; cf. [Bladt2017, Sec. 3.4] for details. In particular, the matrix exponential can be defined in this way.
We now revisit three examples which can be used as building blocks for our regression approach, and propose an additional novel distribution which is relevant for modelling heavy tails.
Example 2.1 (Matrix–Pareto distributions).
Let , where and . Then
Here and . Consequently,
We refer to the distribution of as a matrix–Pareto distribution. It is easy to see from (2.3) that as , where is a slowly varying function and is the largest real eigenvalue of .
Example 2.2 (Matrix–Weibull distributions).
Let , where and . Then
Here and . Thus,
This is called matrix–Weibull distribution, since the scale parameter of the usual Weibull distribution is now replaced by a matrix. One can show that as , where . In particular, their tail belong to the Gumbel domain of attraction, and can be lighter or heavier than exponential.
Example 2.3 (Matrix–Gompertz distributions).
Let , where and . Then
Here and . Thus,
This is called matrix–Gompertz distribution, and has tails much lighter than exponential. Observe that from an inhomogeneous Markov point of view, the subintensity matrix of the underlying jump process is given by , which for scalar gives the hazard function of the Gompertz distribution. The latter has been shown to be effective for the modelling of human mortality over long age ranges, motivating the use of as a model for the lifetime of an individual (see [macdonald2018modelling] for a recent account on mortality modelling).
We now introduce a new type of IPH distribution that is tail equivalent to a Lognormaltype distribution
Example 2.4 (Matrix–Lognormal distributions).
Let , where and . Then
Here and . Thus,
We call this the matrix–Lognormal distribution. Concerning its asymptotic behaviour, we have that as . This, in fact, corresponds to a Lognormaltype tail behavior where the heaviness of the tail is controlled by the parameter of the function. The usual lognormal tail behaviour arises when .
2.2. Parameter estimation
An EM algorithm for maximum likelihood estimation of IPH distributions was recently introduced in Albrecher et al. [albrecher2020iphfit]. The key is to apply a parameterdependent transformation after each step of the EM algorithm. We describe briefly such an algorithm.
Let be an i.i.d. sample of an inhomogeneous phase–type distribution with representation , where is a parametric non–negative function depending on the vector . The EM algorithm for fitting is derived under the observation that , where is defined in terms of its inverse function . The EM algorithm for fitting works as follows.
Algorithm 2.5 (EM algorithm for IPH distributions).
0. Initialize with some “arbitrary” .
1. Transform the data into , , and apply the E– and M–steps of the EM algorithm of Asmussen et al. [asmussen1996fitting] for PH distributions by which we obtain the estimators .
2. Compute
3. Assign and GOTO 1.
Remark 2.1.
In [albrecher2020iphfit], it was shown that the likelihood function increases at each iteration, and consequently converges to a (possibly local) maximum. Moreover, such an algorithm can be employed to estimate distributions that are not IPH in a strict sense, but that are defined as the distribution of a transformed phasetype random variable. This is the case, for instance, of the matrixGEV distribution; see [albrecher2020iphfit] for further details.
2.2.1. Censored data
In many applications, a large proportion of the data is not entirely observed, or censored. It turns out that the EM Algorithm 2.5 works much in the same way as for uncensored data. In this paper, we consider only the case of rightcensoring, since it arises naturally in the context of its applications in survival analysis. However, the cases of leftcensoring and intervalcensoring can be treated pretty much in the same way. The main difference is that we are no longer observe exact data points , but only . In particular, by monotonicity of we have that , which can be interpreted as a censored observation of a random variable with conventional phase–type distribution. Thus, we can employ the EM algorithm in Olsson [olsson1996estimation] to derive the following adaptation of the EM algorithm for inhomogeneous phase–type distributions in the presence of censored data.
Algorithm 2.6 (EM algorithm for IPH distributions with rightcensoring).
0. Initialize with some “arbitrary” .
1. Transform the data into , , and apply the E– and M–steps of the EM algorithm of Olsson [olsson1996estimation] by which we obtain the estimators .
2. Compute
3. Assign and GOTO 1.
Similarly, as in [albrecher2020iphfit], one can show that the likelihood function increases for each iteration. In what follows, we will provide a proof of a more general algorithm that includes covariates, thus generalizing the previous algorithm.
Example 2.7.
We consider a respiratory disease dataset that can be found in the following link https://datos.gob.mx/busca/dataset/informacionreferenteacasoscovid19enmexico, which contains information from the Epidemiological Vigilance System of Viral Respiratory Diseases, of the government of Mexico.
The dataset is updated regularly, and the version used for this article is the one downloaded on October 11th, 2020. At the time of analysis, the data consisted of observations related to subjects deemed suspicious of having a viral respiratory disease upon arrival to the health sector’s medical units. Subsequently, the subjects were hospitalised or left ambulant. The dataset does not contain patients’ evolution, but it does record the time of entry, time of leaving, and possible death.
We consider only patients that were COVID19 positive, that is, subjects. The variable measuring duration of hospitalisation can help us estimate the disease’s mortality on the patients that managed to arrive in the health sector.
Around
of the subjects died during their hospitalisation. The stay duration has an incredibly difficult shape to fit, since a large number of patients do not stay long. We consider fitting the following parametric models: Weibull, Pareto, LogLogistic, Matrix–Pareto. For the latter, a Coxian structure for the a 3dimensional matrix
is chosen. The implementation is summarised in Figure 2.1. We see that a simple matrix enlargement for the intensity of the underlying distribution can help to capture the shape for upper and lower quantiles alike correctly. In short, it is not that the hazard is not asymptotically Paretolike, but rather the optimisation was trying to make a global compromise in the scalar case.
An indepth analysis of this dataset will be the subject of future research.
3. The proportional intensities model
This section considers a generalisation of the proportional hazards model for inhomogeneous phase–type distributions. The main idea is that the intensity will be the function which varies proportionally with the covariates.
We propose a regression model employing IPH distributions with representation and considering proportional intensities among subjects. Specifically, we assume that is a parametric non–negative function depending on the vector
and incorporate the predictor variables
by considering(3.1) 
and we call this model a proportional intensities model. The interpretation in terms of the underlying inhomogeneous Markov structure is that time is changed proportionally among different subjects before timetransforming the associated PH distribution into an IPH one. In the sequel we will indistinctly denote by for a vector of covariates at the population level, or a matrix of covariates for finite samples. In the latter case, denotes the th row of the matrix, corresponding to the th individual in the sample.
Remark 3.1.
The role of the exponential function in the above specification is, in a sense, merely for illustration purposes. In the remainder of the paper it is possible to consider models where the multiplication with – be it in the intensity of on the variable itself – can be replaced with for any nonnegative function (monotonicity is not required).
The density and survival function of a random variable with distribution
are given by
Note that the function in the representation of a IPH distribution is not the same as the hazard function. In fact, the hazard function is given by
This distinction is subtle, but leads to increased flexibility in the interplay of hazards between subjects. A closely related function is the cumulative hazard function , which is given by
Thus, for an distribution, takes the form
Remark 3.2.
In the proportional hazards model one assumes that the hazard function is such that
The part of is sometimes called the baseline hazard function or the hazard function for a standard subject, corresponding to a subject with . Common parametric models for employed in this context are the exponential, Weibull and Lognormal distributions.
One of the advantages of the proportional intensities model is that the implied hazard functions between different subjects can deviate from proportionality in the body of the distribution (for ), as can be observed in Figure 3.1. This addresses one of the common practical drawbacks of the proportional hazards model. If we have that , with constant. More generally, is asymptotically equivalent to the hazard function in the upper tail, as it is shown in the following result.
Proposition 3.1.
Let , then the hazard function and cumulative hazard function of satisfy, respectively,
where are positive constants.
Proof.
We have that if , then has a representation on the same form as (2.2) (cf. [Bladt2017, Section 4.1.1] for details), which only differs on the multiplicative constants. Thus,
as , where is the largest real eigenvalue of , is the dimension of the Jordan block of and are positive constants. The first expression then follows by recalling that . The second one can be shown in an analogous manner and is omitted for brevity. ∎
Maximum likelihood estimation for the proportional intensities model is possible via an adapted EM algorithm. Let and be defined in terms of their inverse functions and
respectively. Note also that
(3.2) 
In particular . Then, the EM algorithm for fitting the proportional intensities survival regression model works as follows.
Algorithm 3.2 (EM algorithm for IPH survival regression models).
0. Initialize with some “arbitrary” .
1. Transform the data into , , and apply the E– and M–steps of the EM algorithm of Olsson [olsson1996estimation] by which we obtain the estimators .
2. Compute
3. Assign and GOTO 1.
Remark 3.3.
The Esteps require the computation of matrix exponentials which can be performed in multiple ways (see [moler1978nineteen]). In Asmussen et al. [asmussen1996fitting] this is done by converting the problem into a system of ODEs, which are then solved via a RungeKutta method of fourth order (a C implementation, called EMpht, is available online [olsson1998empht]). We employed a new implementation in Rcpp of such a method for the illustration here and everywhere else in the paper. An important consideration is that the RungeKutta method requires the sample to be ordered. This is relevant since the transformed data of the above algorithm, will in general not be ordered anymore, given that covariate information varies across subjects. Thus, intermediate ordering steps have to be introduced before proceeding to the numerical implementation of the EM algorithm of [olsson1996estimation] by the Runge–Kutta method. The reordering also modifies the ties in the data, which must be accounted for. Additionally, the evaluation of the likelihood in step 2 may be done using byproducts of step 1, which implies that reordering and reweighing will also be necessary at each iteration within the optimisation inside step 2. These additional steps pose a significant computational burden relative to the noncovariate algorithm. The uniformization method ([albrecher2020iphfit]) is an alternative method, but the comparison of the two methods’ performance is beyond this paper’s scope.
Proposition 3.3.
Employing Algorithm 3.2, the likelihood function increases at each iteration. Consequently, it is guaranteed to converge to a (possibly local) maximum.
Proof.
Note that the data loglikelihood is given by
Consider parameter values after the th iteration. In the th iteration, we first obtain in 1. so that
Then, adding the remaining terms which are unchanged by the EM step 1., we obtain
Hence, by 2.,
proving that at each iteration the loglikelihood increases. ∎
Next we consider two important distributions which in presence of covariates, generalize the commonly employed models for parametric proportional hazards modeling: the exponential and the Weibull baseline hazard models.
Example 3.4 (Phase–type regression).
We consider , thus for all . The survival function takes the form
Note also that , where . In particular, this implies that
Example 3.5 (MatrixWeibull regression).
We consider , thus for all . The survival function takes the form
Note also that , where . The expression for the expected value is then given by
cf. [Bladt2017] for the derivation of noninteger moments of phase–type distributions.
Remark 3.4.
Note that with , one recovers the conventional proportional hazards models. Thus, our model can be considered a generalization of this type of model. Moreover, both cases have the interpretation that the expected values are proportional among subjects. In other words, for arbitrary matrix dimensions, when the baseline hazard is phasetype or matrixWeibull, the proportional intensities model is equivalent to the accelerated failure time model. It is also not hard to see that homogeneous functions simultaneously satisfy both models. This feature is in agreement with the fact that for the proportional hazards model, the exponential and Weibull distributions also satisfy the accelerated failure time model, as we will discuss in section 4.
3.1. Goodness of fit
As in the conventional proportional hazards rate model, the quality of the fit can often be checked easily using graphical methods. For this purpose, nonparametric estimates of and are primary tools. The goodness of fit of the proportional intensities model can be done by considering a generalisation of the Cox–Snell residuals, as we now explain.
Define the residuals of a proportional intensities model by
Under rightcensoring completely at random and the joint null hypothesis
the sample
constitutes a rightcensored unit exponential dataset. Hence, in practice we may consider the estimated versions , and their KaplanMeier survival curve
where is the number of residuals tied at the value , and are all the residuals that have not been fully observed yet. The function
should fall within the confidence bounds, using that the variance can be estimated, for instance, using Greenwood’s formula
Alternatively, the plot of versus should align with a line of intercept zero and slope one. The NelsonAalen estimator can be used as an alternative to the KaplanMeier estimator (they are asymptotically equivalent).
3.2. Covariatedependent inhomogeneity functions
Within the above setup, it is possible to add dependence of on . That is, we may consider the model
(3.3) 
where is any uniparametric form of the intensity. The residuals for assessing goodness of fit then become
In this general setting one can also consider the replacement of with for any function is possible without any increase of computational complexity. Extensions to the multiparameter specifications are straightforward to define.
A modified EM algorithm akin to Algorithm 3.2 applies for this model, where we also optimise with respect to in step 2. We omit the details for brevity.
3.3. An example
In this subsection we provide an example which illustrates the practical feasibility of the proportional intensities model. The data is simulated. A second and wellknown dataset from the Veterans’ Administration Lung Cancer (which is publicly available in the R package survival) will be studied in the next section, in conjunction with the accelerated failure time model.
Thus, consider the following simple twogroup setup. We study two groups of equal size, each. For the first group,
while for the second
Subsequently, we randomly censor the by exponential variables of rate by
leading to roughy rightcensored observations. Notice that in the above setting the inhomogeneity function also changes with the covariates, such that we need to implement the fitting procedure associated to the model 3.3. Additionally, for comparison, we fitted two misspecified models: a standard Weibull proportional hazards model, and the scalar () version of model 3.3. The resulting empirical versus fitted survival curves for each group, as well as the hazard ratio between the two groups is given in Figure 3.2. We observe that only the correctly specified model recovers the wavy shape of the theoretical hazard ratio, and within each group the survival curve is better estimated in the tails of the distribution. Formally, the goodness of fit is evaluated in Figure 3.3. In the latter plot, we use the KaplanMeier and NelsonAalen nonparametric estimators of the survival curve of the residuals to visually assess standard exponential behaviour. We observe that only the correctly specified model provides an adequate fit in the sense of staying within the confidence bounds implied by Greenwood’s formula for the first approach, and by aligning itself with the identity for the second one.
4. Accelerated failure time models
The accelerated failure time (AFT) model specifies that the predictors act multiplicatively on the failure time or additively on the log failure time. The effect of a predictor is to alter the rate at which a subject proceeds along the time axis, i.e., to accelerate the time to failure. Specifically, letting be a random variable with distribution , the model states that
(4.1) 
Thus, the survival function of is given by
Note that (4.1) is equivalent to
(4.2) 
Thus, an alternative way to represent is throughout the random variable . Moreover, denoting for the hazard function of , we have that
Whereas in a proportional hazards model, the covariates act multiplicatively on the hazard, in an AFT model the covariates act multiplicatively on time. Common distributions used in the context of AFT modeling, include the Weibull, loglogistic and lognormal distribution. Moreover, the Weibull (and exponential) distribution satisfies the proportional hazard rate and the AFT model. Estimation of AFT models is typically done via MLE.
We now specify IPH distributions in the AFT model. Thus, we assume that is IPH distributed with representation , where is a parametric non–negative function depending on the vector . Then , where and is defined in terms of its inverse function . In particular,
and
Note that if we define , then . For our purposes, it is convenient to work with this last representation, i.e., we consider where is such that , thus making our model .
Example 4.1.
(Matrix–Weibull) We know that the matrix–Weibull with parameters , and has representation , where . Note also that has distribution of the same type as the matrix–Gumbel (see [albrecher2020iphfit]). Now consider the AFT model
Then
which corresponds to a reparametrization of the proportional intensities model. Thus, the matrix–Weibull satisfies the proportional intensities and the AFT model, in agreement with its scalar counterpart.
More generally, if is homogeneous of degree , i.e., , then
where . This implies (see 3.2) that satisfies the proportional intensities and the AFT model.
We now introduce matrix versions of the logistic and loglogistic distributions.
Example 4.2.
(Matrix–logistic) For with and .
This distribution can be interpreted as a matrix version of the logistic distribution. Note that this distribution is not IPH in the strict sense, but it is defined as the distribution of a transformation of a phasetype distributed random variable.
Example 4.3.
(Matrix–Loglogistic) For with .
We also compare the estimated models against accelerated failure time specifications, which (except for the Weibull case) also account for nonproportional hazards.
The data consist of observations and variables coming from a randomised trial where two groups are given different treatment regimens for lung cancer. The variables of interest for our analysis are survival time, censoring status (binary), treatment (binary), prior therapy (binary), and Karnofsky performance score (from 0 to 100). Previous studies (cf. for instance, the vignette https://cran.rstudio.com/web/packages/survival/vignettes/timedep.pdf, pp. 15–22) have shown that only the Karnofsky score violates the proportional hazards assumption. Common ways to address this issue are stratifying the dataset into timegroups and considering continuous timedependent regression coefficients. The moral of that story ends up being that the coefficient for treatment becomes significant when improving the model’s fit. Consequently, we presently focus only on the goodness of fit of four competing models introduced in the text. The latter measured by their loglikelihood and residuals.
We consider two classical models: a) Weibull proportional hazards model; b) Loglogistic accelerated failure time model. The distributions were selected according to performance. Given the modest size of the dataset, we will avoid, in the name of parsimony, large matrices for the IPHbased models. Thus, we present two models based on with a Coxian structure: c) MatrixWeibull proportional intensities model; d) MatrixLognormal accelerated failure time model. We also fitted a MatrixLoglogistic distribution, but it performed better than its scalar counterpart only for large matrices and is thus omitted.
The results are summarized in the table below and in Figure 4.1.
Model  Loglikelihood  Nb. Parameters  AIC  BIC 

a  136.21  5  282.42  297.02 
b  130.54  6  273.09  290.61 
c  127.74  7  269.48  289.92 
d  127.81  7  269.63  290.07 
We observe that models b), c) and d) all seem to improve the fit to where the residuals are within bounds. Visually, there appears to be not much difference between the three last models, and the BIC reflects this as well. The loglikelihood and AIC are best for model c), which is both a proportional intensities and an AFT model. The takeaway is that when faced with a dataset where the proportional hazards assumption does not seem to hold, it is worthwhile exploring the enlargement of the underlying parameter space to the matrix domain. In this enlarged space, the proportional intensities model allows for a more flexible specification of the hazard, with proportionality holding only asymptotically.
The AIC and BIC computed here for the matrixdistributions works well for comparison purposes given the low dimensions of the distributions. However, in general one should proceed with caution (especially in high dimensions) due to the wellknown identifiability issue for PH distributions, which carries over to IPH survival models. Namely, different dimension and parameter configurations may lead to very similar density shapes.
5. Extensions and future research
The proportional intensities model can be modified to allow for more flexible models. We presently show how these extensions can be carried out similarly as in the proportional hazards model. For instance, we propose two immediate extensions: a stratified version and an additive model. We also discuss other possible lines of research, such as timedependent covariates.
5.1. Stratified model
For a stratified version, we assume categories and we include different baseline intensities for each category. For illustration purposes, we assume that the effect of the covariates is the same multiplicative constant on the intensities across all groups, however different constants among the groups can be employed as well. More specifically, the intensities for the group are
where , is the baseline intensity for the group , . To describe how maximum likelihood estimation via an EM algorithm similar to algorithm 3.2 can be performed in this case, we consider a sample , where observations corresponds to the group, , such that . Thus, the main changes are that the data should be transformed into , , where , , and that the densities and survival functions should be adapted to match the separation of the data into groups.
5.2. Parametric additive intensity models
We consider a parametric additive model similar to the one proposed to Aalen [aalen1989linear] employing IPH distributions. More specifically, we let
where and is a parametric function depending on , . Here, represents the baseline intensity, which corresponds to a subject with all covariates equal to zero. Note that
Thus, denoting , we have that
Moreover, maximum likelihood estimation can be performed by employing a modified Algorithm 3.2 with the main difference being that the data is transformed into , , with as above. The density and survival function should be modified accordingly as well.
5.3. Time dependent covariates
We allow the covariates to depend on time, i.e., we consider . Thus, a proportional intensities model that allows for time dependent covariates would assume that
The main difficulty of this type of model is that
in general does not have an explicit solution. This is relevant since the estimation methods introduced in this paper make use of transformed data via . Possible solutions are:

Look for specific structures that allow for explicit representations.

Approximate numerically. This approach may come to a high computational cost, especially considering that the transformation must be applied in each iteration of the EM algorithm.
An indepth exploration of these approaches represents a future line of research.
6. Conclusion
We have introduced several survival models which are based on absorption times of inhomogeneous Markov processes. Their interpretation in terms of underlying phaseswitching is very natural for timetoevent data, and their denseness makes them exceptional candidates to consider for reallife applications. We have shown, using recent and new matrixbased distributions, that misspecified survival models can sometimes be just one matrix away from providing a good fit. The practical implementation of the methods in this paper relies on (a modified version of) the EM algorithm, which for large matrices and datasets can become significantly slow. It is our experience that the use of an inhomogeneous intensity function in the underlying stochastic jump process can alleviate the need for a large number of phases, thus making estimation feasible. Finally, we have outlined some extensions which further refine the regression models of this paper; their systematic study and application to largescale data is a challenging and promising direction.
Acknowledgement. MB would like to acknowledge financial support from the Swiss National Science Foundation Project 200021_191984.