1 Introduction
This report describes and analyses sojourn time distributions which can then be applied to parametrise the semi-Markov model (SMM). Soujourn time distributions are the distributions that govern how long a semi-Markov model remains in each state. Unlike usual Markov-models where the sojourn time distribution is geometric, semi-Markov models allow any distribution to be used.
Such sojourn time distributions have been modelled before by using known distributions such as the gamma, Poisson, multinomial, more general exponential family members and Coxian phase-type distributions particularly in the context of hidden semi-Markov models (HSMMs) [14, 19, 17, 2, 11, 7, 1]. An R-package is available for modelling some of these distributions [3]. Non-parametric versions have also been studied in the context of expectation maximisation for HSMMs [2, 19], however the underlying properties of these distributions have not been studied in the literature and we seek to do this Section 2. Further, the two subfamilies of the distributions we study have not previously appeared in the literature.
In general, the sojourn time distributions we study can be considered discrete generalisations of the geometric distribution in finite or infinite time. There are many approaches for generalisations of the geometric distribution [15, 18, 9]
, although they usually only consider infinite time. For example, considering the exponential as the continuous analog of the geometric distribution and the gamma distribution as generalising the exponential distribution, then discretisations of the gamma distribution can be considered generalisations of the geometric distribution as in
[4].Unlike other geometric generalisations, ours will involve the parameters
. These parameters can be considered as the marginal probabilities that a semi-Markov chain stays in the same state for the next time step, as in
Section 2.4. For a Markov chain, the geometric sojourn time distribution corresponds to constant for each state, and here we extend this to consider linear and simple polynomial factor models for . We are not aware of any other research on generalisations of the geometric distribution that has such an approach.Markov models are often applied to model behaviours where only small amounts of data may be available, so we require parsimonious models which nevertheless have sufficient generality to capture different kinds of qualitative sojourn time behaviours. We also require models for which statistically efficient estimation methods can be defined. At this stage, we make the obvious point that whilst maximum likelihood estimators (MLEs) are statistically unbiased and efficient for large sample sizes [6]; this may not be the case for small sample sizes such in our experimental data. So the study of the performance of MLEs forms an important part of this study which we undertake in Section 3.
Statistical inference methods for hidden SMMs (HSMMs) are typically more computationally demanding than the equivalents for HMMs, so it is also of interest to determine whether an observed process has Markovian state transitions or the more general semi-Markov ones. We show in Section 4 how we approached this for our experimental data, which we can use as a methodology for testing this kind of hypothesis in general. We also see a strong fit between this experimental data and the simple polynomial factor model, suggesting the theory performs well in experimental settings.
Overall, the contents of this report gives a novel approach to modelling sojourn time distributions that will be of use for (hidden) semi-Markov models, and already holds appeal for modelling experimental data. The results concerning the sojourn time distributions may be of independent interest for the discrete distribution literature.
1.1 Layout of the report
The contents of this report is as follows. In Section 2.1 we define the family of distributions we aim to study, which generalises the geometric distribution. We give general properties including the mean, variance, moment and probability generating functions as well as results that determine when the moments exist in general. These results are new. In Section 2.2 and Section 2.3 we define the subfamilies of interest, the linear factor model and simple polynomial factor models, and describe their distributions, which is also new research. In Section 2.4 we show the relationship to semi-Markov models and discuss further results in this area.
In Section 3 we study how to determine the MLEs of the subfamilies of interest, studying the linear factor model in Section 3.1 and the polynomial factor models in Section 3.2. In each case, we determine numerically the MLEs from various samples and show how the variance of the MLEs closely follows the inverse Fisher information which is related to the Cramér-Rao lower bound [6] suggesting our numerical methods are behaving efficiently. We study how this changes with different sample sizes. We also discuss methods to determine the size of the support of the distribution, i.e. the maximum sojourn time.
In Section 4 we analyse our experimental data. We show how the data has sojourn times which do not appear to follow a geometric distribution, and why a simple polynomial factor model is a better fit. We then find the MLEs for such a model for each of the three tasks, including determining the maximum sojourn time, and discuss how well these models fit. We are satisfied with the model for tasks 1 and 3, and suggest some improvements for task 2. In Section 5 we conclude and discuss future work.
In Appendix A the second author expands on the work for the linear factor model. He confirms the results in Section 3.1 with separate simulations, as well as shows how the expected log-likelihood explains some of the behaviour of the MLEs and how to further understand estimation when the maximum sojourn time is unknown. In Appendix B we include the experimental data analysed in Section 4.
2 Generalising the geometric distribution
Here we define a family of discrete distributions that generalise the geometric distribution in both finite and infinite state cases. We give properties of this family, including formulae for the moments, and prove results relating to their existence in the infinite support case. We note that this family is particularly general as it can be used to characterise all discrete distributions that are supported on with a positive integer and possibly infinite.
We then focus on two specific subfamilies, the linear factor model in Section 2.2 and some simple polynomial factor models in Section 2.3. Finally we detail the relationship to semi-Markov models in Section 2.4 and the relationship to matrix analytic methods.
As far as we are aware, all results in sections Section 2.1, Section 2.2 and Section 2.3 are new contributions to the literature on discrete distributions, which we expect will also lead to new contributions in Markov processes research.
2.1 Definition and properties
We define the family of discrete distributions that we intend to study then study properties of the distributions.
Definition 2.1.
Take any with , and set , where is a positive integer (which we may allow to go to ). Then we can define a discrete distribution with support contained in having probability mass function (PMF)
(2.1) | |||
where we use the convention that product is equal to 1.
Note that we could allow some for , but then and then for all , reducing the support of . Also, whenever we have , so as we see below, this is sufficiently general to capture all discrete distributions with support contained in . We can also shift the distribution of to distribution with support contained within for any integer by setting . We will consider this in Section 4.
It is clear that each as products of numbers in remains in . We have
as . So gives a valid PMF.
If we let for some fixed and take then this is gives the geometric distribution with PMF for
(2.2) |
In general, we can interpret the as the probability of an event occurring at time given it also occurred at time . Then is the probability of the event not occurring at time given it occurred at time . In the geometric case, the time steps are independent and does not change, however the general described above allows for dependence between each time. We expand on this in Section 2.4 where we show how this relates to a semi-Markov model.
Lemma 2.2.
We have the relationship
(2.3) |
Proof.
By induction, we can show
(2.4) |
and the result follows. ∎
Note that Equation 2.4 shows that any discrete PMF supported (with possibly infinite) can be defined in terms of , so in fact creftypecap 2.1 is an alternative way to describe all such distributions. If
is the corresponding cumulative distribution function to
, thenand also
(2.5) |
Note that if then and this means the distribution has reached the end of its support. We use this in Section 4. The next proposition characterises all moments of the distribution in terms of the .
Proposition 2.3.
For a random variable
The -th derivatives are
So we have
and then
The probability generating function is
with -th derivatives
The factorial moments are
Proof.
The moment generating function is
The formulas for the derivatives and follow directly. We then have the variance
The probability generating function is
The -th derivatives are straightforward to calculate, so the factorial moments are
∎
If is finite, it is clear that all moments are finite. Below we consider the case when is infinite.
Proposition 2.4.
Let be a random variable with PMF as in Equation 2.1 with , and say
where we must have as . Then all moments exist whenever . If then some moments may exist.
Proof.
The ratio test says that a series for converges whenever
It diverges if the limit is and may or may not converge if this limit is . Say we have , then we can consider the series
for some , which appears in the formula for the moments in creftypecap 2.3. Then
therefore all moments centred at converge when by the ratio test. As all moments of order can be written in terms of moments centred at of up to order , then all moments converge for when . Similarly, they diverge if (which is not possible as we have for all ) and may or may not converge if . ∎
Note that we do not necessarily have that converges, so a more general statement would be than any subsequence of the that converges must converge to something less than to guarantee existence of the moments. This is equivalent to the following stronger corollary of creftypecap 2.4.
Corollary 2.5.
Let be a random variable with PMF as in Equation 2.1 with , and say
where we must have as . Then all moments exist whenever . If then some moments may exist.
If , that is, there are (sub)sequences of the which converge to , more complicated arguments must be considered as in the following example.
Example 2.6.
If for then and we have
which diverges and no moments exist. These correspond to PMF
More generally, if for some positive integer then still , however we have
When , we have
so this sum converges and this means moments up to order exist. When we have
so this diverges, and all moments of order or higher do not exist. These correspond to PMF
Example 2.7.
For a random variable with the geometric distribution , as in Equation 2.2 then using creftypecap 2.3 we have mean
and variance
Both of these results are as expected from the geometric distribution.
2.2 Linear factor model
In this section we consider the case where we have a linear , that is
For this to be valid, we require that for all . This then means if we send we must make , resulting in the geometric distribution. If we consider finite then we require the endpoints are in , which means we need , so and , so . Combining these we have
To make sure the end points are non-overlapping, we then also require that for .
A natural choice in this case would be to set such that , which gives and . Then
and the corresponding is as follows
This is a natural extension of the geometric distribution over a finite domain.
Note that we can similarly use instead of , where we have
with , .
We can graph for certain values of (and corresponding ) and , as in Figure 2.1.

We see quite different behaviours depending on the value of . The larger exhibit a non-monotonic, uni-modal distribution with maximum near the centre of the support, while the smaller exhibit more geometric-like qualities, with maximum at and decreasing for larger . Varying gives similar results.
2.3 Simple polynomial factor models
Here we concentrate on models of the form
with . We require to be a fixed positive integer.
We again set which gives , so we must have provided we assume . Note that if and then either and we are considering linear cases from the previous section, or and we have .
Given this we assume then writing in terms of we have
Similarly, writing in terms of we have
We require for all . Whether
is even or odd changes the requirements on
and to ensure this. We get the conditions that and thatNote that even and implies , which we will again exclude here. With certain applications in mind as in Section 4, we are most interested in the odd cases particularly with , however we will consider all cases in generality when possible.
In Figure 2.2 we graph for certain values of (and corresponding ) with , and .

Note that here we see quite different behaviours depending on the value of . The larger exhibit a non-monotonic, uni-modal or even bi-modal distributions with maxima towards , while the smaller exhibit more geometric-like qualities, with maximum at and decreasing for larger . Varying gives similar results.
We also graph while varying and with fixed , and , as in Figure 2.3

Note that here we see quite different behaviours depending on the value of . The larger exhibit a non-monotonic, uni-modal or even bi-modal distributions with maxima towards , while the smaller exhibit more geometric-like qualities, with maximum at and decreasing for larger . That such a varied behaviour is achieved through only two parameters will be helpful for fitting to real data as in Section 4. Varying gives similar results.
In Figure 2.4 we further separate out the different behaviours for fixed and changes in .

2.4 Relationship to SMM
In this section we describe the relationship between the family of distributions in creftypecap 2.1 and semi-Markov models. This is one of the motivations for this paper. This section also contains details on how semi-Markov models can be considered using Matrix analytic methods. This will be important in future applications of the sojourn time distributions studied so far.
2.4.1 Sojourn time distributions arising from SMMs
In a semi-Markov model, we have a random process . Here is a random variable for each with state space and is a random variable for each . Either has a continuous state space in for some possibly infinite or a discrete state space in for some integer possibly infinite. Here we will only consider discrete state spaces.
Note that we differ from some traditional presentations of semi-Markov models as in [13, §1.9] where often is only the transitions of the process and is the times these transitions occur, assumed to be continuous times, giving a Markov renewal process . Instead, here our will be a discrete random process that may remain in the same state as increments, and the will record the number of times the process has remained in the same state continuously.
We require the independence condition that
and the time-homogenous condition that
for all .
This means that is a Markov chain. We require that count the number of consecutive time steps has spent in state , so that if we observe that both and are equal to , then
and otherwise if then
Then is specified by the following probabilities
Here the can be considered as the margin probability that the chain stays in the same state for the next time step. With this set up, we call a semi-Markov process. Figure 2.5 shows the allowable state transitions for the process .
Example 2.8.
Note semi-Markov processes are an extension of Markov processes. Given a time homogenous Markov process specified by transition probabilities , then we can define as the consecutive time has spent in the current observed state since it entered this state. We then get that
and for we get that
Note that neither of these probabilities depend on the time .
We have suggestively used in the notation above. This is due to the satisfying the conditions in creftypecap 2.1 for each . We have that
Then we have that