One of the earliest works in compartmental disease modeling is the seminal 1927 paper by Kermack and McKendrick . It assumes the population is segregated into susceptible (S), infected (I), and recovered or removed (R) compartments. Kermack and McKendrick proposed the following well known system of Ordinary Differential Equations to describe the time evolution of the population proportions in each compartment, denoted by and respectively:
Here and are the infection and recovery rates, respectively. Solutions to Equation 1.1 are often called the SIR curves (see Figure 1). In the absence of any specific contact structure, the law of mass action has been implicitly assumed, so an infectious individual can potentially infect any susceptible individual. Despite its popularity and widespread use over decades, the ODE model in Equation 1.1 averages out individual dynamics and, therefore, does not capture the stochastic fluctuation of epidemic processes in real life. In particular, the practical problems of applying Equation 1.1 to data are:
Population size Since the quantities in the SIR equations are proportions, it is not immediately clear how to apply them to real epidemics, which occur in finite susceptible populations. Moreover, the size of the population is often unknown.
Likelihood Since the SIR equations are deterministic, we cannot write a likelihood for epidemic data without further, often ad-hoc, statistical assumptions.
Aggregation over individuals The SIR model represents the mean-field equations for (scaled) population counts, aggregating out individual characteristics.
Aggregation over time The real data are typically aggregated not just over the population but also over observed time periods, leading to interval censoring that cannot be easily incorporated into the SIR equations.
The objective of this paper is to introduce a new way of interpreting SIR Equation 1.1 in terms of a survival function instead of population counts. This will address the first two problems directly, and it will also give us a theoretical foundation for addressing the remaining two problems. Our approach will be applicable not only to mass-acton based SIR-type models but also to a broad class of network-based epidemic models.
The SIR Equation 1.1 is the simplest example of an epidemiological Compartmental Model (CoM). For the purpose of this paper, we understand a CoM as a discrete set of states (compartments) paired with a set of continuous or discrete transition rules between them. Continuous deterministic CoM are often considered to describe the macro (population) level, as in Equation 1.1. Discrete stochastic CoMs are often considered to describe the micro (individual) level dynamics of an epidemic. As we outline below, there exists an interesting and so far largely unexplored connection between the macro- and micro- level descriptions of CoMs (including the SIR Equation 1.1). This connection is based on the notion of mass transfer defined below, and it gives us new insights into how to address the practical problems of statistical inference listed above. The following definition will be useful: [Mass Transfer Model] Any compartmental model with mass conservation (i.e., constant total count or density of individuals) is termed a Mass Transfer Model (MTM). In particular, when there exists a partial ordering on the set of compartments, we call it a unidirectional MTM.
Following the above definition, it is clear that the SIR model described by Equation 1.1 can be thought of as a unidirectional MTM. In this paper, we shall show that this MTM interpretation of the SIR Equation 1.1
can be used to describe the fate of an individual that starts in the first (susceptible) compartment and moves (with certain probability) to subsequent compartments555Recall that there is an inherent partial ordering on the set of compartments in a unidirectional MTM.. In other words, simple algebraic manipulation of the MTM uncovers a precise description of the survival dynamics of an individual (see Figure 1). To emphasize this connection, we shall often refer to a dynamical system describing unidirectional MTM as a Survival Dynamical System (SDS).
Our contributions in this paper can be summarized as follows: We propose a new way of describing unidirectional MTMs such as the SIR Equation 1.1 in terms of a population survival function instead of population counts. This new interpretation will not only allow us to apply all of the standard survival analysis tools to typical epidemic data, but it will also address all four of the practical problems listed earlier. In particular, based on the SDS interpretation of unidirectional MTMs, we propose a simplified likelihood, called the SDS-likelihood, for the purpose of statistical inference. We then numerically verify on simulation data examples that our new inference method compares favorably with standard approaches based on posterior likelihood and Markov Chain Monte Carlo (MCMC) schemes.
The rest of the paper is structured as follows. Section 2 briefly reviews the relevant background on mathematical modeling in epidemiological literature. In Section 3 we make the MTM interpretation of the SIR Equation 1.1 precise whereas in Section 4
we compare the standard as well as our proposed parameter estimation methods followed by numerical results inSection 5. Finally, we conclude the paper with a brief discussion in Section 6. Additional mathematical preliminaries, statistical inference results and other supplementary material are provided in Appendices C, B and A.
Suppose we have susceptible and infectious individuals. Infectious individuals infect susceptible individuals, who change state from susceptible to infected. Infected individuals recover after an exponential infectious period. For the -th individual, define the process such that if he or she is in the susceptible compartment at time and otherwise. Similarly, define the processes for the infected compartment and for the recovered compartment. Naturally, . We assume Markovianness throughout the course of the paper. That is, we assume , for some , is a Continuous Time Markov Chain (CTMC). For the sake of notational convenience, we have labeled the initially susceptible individuals and the initially infectious individuals . Then, following the random time change representation of a CTMC (see [15, Chapter 6], [4, Chapter 5]), we can write, for ,
where are independent unit-rate Poisson processes. Models of this form are often called agent-based models in the literature [28, 7] and if required, may be explicitly simulated by means of the so-called Doob-Gillespie algorithm [19, 3, 48].
2.1 Sellke construction
An alternative construction of the micro model from a survival analysis perspective was proposed by Sellke  as outlined below. Note that conditionally on the history of the infection process (population count of infected) up to time , the infection time of a susceptible individual is given by
Once a susceptible individual gets infected, he/she recovers after an infectious period that follows an exponential distribution with rate. If we denote the recovery time of the -th individual by , it follows immediately from Equation 2.1 that and are independent and follows an exponential distributions with rate . Symbolically,
Note that the fate of an individual is entirely described by the statistical distributions given in Equations 2.3 and 2.2. It is also interesting to note that an individual’s fate depends on the process history only through the variable akin to an improper cumulative hazard function (improper, since with probability one). These considerations lead to Algorithm 1 for simulating the process in Equation 2.1. This is known as the Sellke construction [4, 17, 1] in the literature. It can be easily verified that Algorithm 1 is equivalent to simulating the system in Equation 2.1 using the Doob-Gillespie algorithm. As we describe below, the Sellke construction plays a central role in developing survival representations of the SIR Equation 1.1. Moreover, it also turns out to be equivalent to a statistical representation of micro models under the law of mass action based on contact intervals [24, 25].
2.2 Mean-field limit of Sir
The simplest way to derive a macro model from the micro description is via lumping or aggregation of states. When the aggregation of states is strongly lumpable [23, 40, 41, 8], the resultant aggregated process remains Markovian for any choice of the initial distribution. Now, for the SIR process, let denote the possible statuses of the individuals. Then, is the state space of the ensemble of individual-based processes. Define the macro-level processes
which keep track of the total counts of susceptible, infected and recovered individuals. Let . Partition into such that any two states in each produce the same counts for , for
. It is easy to see that the Markov chain described by the ensemble inEquation 2.1 is (strongly) lumpable with respect to the partition (see [23, 28, 43]). That is, the lumped process is also Markovian for any choice of the initial distribution. Therefore, we can write
where and are independent unit rate Poisson processes. As before, the simulation of the above system can be done using the Doob-Gillespie algorithm. For the sake of completeness, we present a pseudocode in Algorithm 2.
The macro model is particularly convenient in that it is amenable to asymptotic analysis. Indeed, for very large populations, we can approximate the stochasticSIR dynamics by a system of ODEs. The rationale behind this approximation is that pure jump Markov processes approach solutions of a certain ODE in the limit, when scaled appropriately [31, 32]. This is sometimes called mean-field or fluid limit of the Markov jump process.
By virtue of the Poisson Law of Large Numbers (LLN) , which asserts that for large and a unit rate Poisson process , we see that the processes in Equation 2.6 converge to the solution of the following system of ODEs as and :
which are the same as the Kermack and McKendrick ODEs in Equation 1.1. The introduction of is convenient because it sets , and . The rate of convergence to this LLN ODE limit can be computed using sample path Large Deviations Principle (LDP) of the Markov process in Equation 2.6. Standard tools from [16, 11, 14] as well as related results from [13, 39, 12] can be borrowed for this purpose. However, our main motivation here is to interpret Equation 2.7 as describing an MTM. We make this point precise in the following.
where is the basic reproduction number. Here, the first two equations are obtained by partially solving the ODE system using the integrating factor (first equation) and the variation of parameter method (second equation).
We will now interpret Equation 3.1 as describing the mass (probability) transfer model in an infinite population where a randomly selected unit transfers over time from the initial susceptible (S) compartment first to the infected (I) and then to the removed (R) compartment. This idea is depicted in Figure 2. The transfer process is described by the state (compartment allocation) of a randomly selected unit (say, ) at where with .
According to the mass transfer interpretation of the Equation 3.1, the time of infection (transfer from to ) of
is given by the (improper) random variablewith its distribution determined by the function , that is
Note that this is a direct analogue of Equation 2.2 in our aggregated macro model where the stochastic quantity is replaced by its deterministic limit from Equation 3.1. It is important to note that in the limit, the units become independent. This phenomenon is also known in the literature as mean-field independence or propagation of chaos [34, 6, 36].
None also that although by our assumption and is non-increasing, it is nevertheless an improper survival function since where satisfies the deterministic final size equation
which is a contraction map and therefore, numerically amenable to efficient fixed-point iteration schemes. Since , we may interpret as the probability of unit ever transferring out of the compartment (ever being infected). Consequently, may be thought of as the (improper) cumulative hazard for and as the (improper) hazard function of the improper random variable . This hazard is sometimes called the force of infection
. By the law of total probability. Here is the proper conditional survival function, conditioned on , that is, on an event that individual ever gets infected (i.e., transfers out of ). Note that according to Equation 3.1 the density for is
where . Since is a density function, the right hand side above is a convolution of the density of and the (exponential) density of . It thus follows that the right hand side quantity
is itself a density of the variable , which is the sum of two independent random variables and (that is ). Note the analogy of this result with Equation 2.3. These considerations give us Algorithm 1 for simulating the individual histories in the MTM SIR model. See also Figure 2 for a pictorial representation of the idea.
Note that analyzing timepoints according to the Algorithm 1 addresses all four issues of macro SIR model in Equation 1.1 described in Section 1. Indeed, Algorithm 1 no longer requires the population size (problem 1). Direct generation of individual trajectories according to Algorithm 1 also allows us to specify a likelihood function (problem 2), account for differences in individual characteristics (problem 3), and overcome issues with censoring or interval-based data (problem 4).
4 Parameter inference
, the vector of parameters of interest iswith , since the parameter is expressible in terms of via Equation 3.3. The size of the initial susceptible population () is usually unknown and may be considered a nuisance parameter. The estimation of this nuisance parameter is often problematic, and popular methods such as profile likelihoods do not always yield good estimates. In order to address this problem, we propose the SDS likelihood, which is based on the SDS interpretation of the SIR Equation 1.1 and does not require . Before going into the details of SDS likelihood, we describe the exact likelihood based on the Doob-Gillespie Algorithm 2. To emphasize the strength of our SDS likelihood and compare its performance against the exact likelihood, we assume that the value of is available for the exact likelihood.
4.1 Exact (Doob-Gillespie) likelihood
Assume that there were total of events up to time of which are infections and are removals at times , where denotes the type of the event. Put . Then, following Algorithm 2, the exact log-likelihood for is
Because we assume we know the population size and the trajectory for the exact likelihood, the parameter is known exactly.
4.2 Sds likelihood
Following the discussion in Section 3, an approximation of the exact likelihood function in Section 4.1 can be obtained from Equation 2.2 by replacing the process with its limit (as ) and considering the individual trajectories as independent. Since we let , the exact value of the initial size of the susceptible population is no longer needed.
Assume we randomly sample individuals of whom are found susceptible and , infected. We observe those individuals up to the cut-off time and record their infection or recovery times. Suppose out of the initially susceptible individuals get infected at infection times and of them recover by time . Pair each infection time with the corresponding duration of infectious period if the individual recovers by time . If the individual does not recover by time , pair with the censored information . Among the initially infected individuals, suppose individuals recover by the cutoff at times . Then, following Algorithm 1, we have the following SDS likelihood
where, as described in Section 3,
5 Numerical examples
5.1 Bayesian estimation using Mcmc
In this section, we present numerical examples to illustrate how one can use the SDS-likelihood in Section 4.2 to infer the unknown parameter using MCMC methods. In order to construct a posterior distribution for , we assign gamma priors to the parameters , and :
The positive quantities , and are appropriately chosen hyper-parameters. The posterior distribution of is obtained by Bayes’ rule: It is proportional to the product of the likelihood function given in Section 4.2
and above three priors. However, the posterior distribution cannot be written in closed-form. Even if a conditional posterior distribution is obtained, we can not find any closed-form expression for the probability density function because we need to have solutions, , to Equation 3.1, which are also functions of . Thus, we cannot immediately employ a generic Gibbs sampler method [44, 10]. Therefore, we need a more efficient updating algorithm than the standard Metropolis-Hastings algorithm. In this paper, we adopt the Robust Adaptive Metropolis (RAM) algorithm666The RAM method generalizes the Adaptive Scaling Metropolis (ASM) algorithm  by updating the tuning constant appropriately to achieve optimal acceptance ratio. [46, 35]
, which adjusts the tuning constant and the variance-covariance matrix of the proposal distribution adaptively to maintain a high acceptance ratio in the Metropolis steps. The variance-covariance matrix is updated during theMCMC iterations.
5.2 Simulation study
In order to compare the accuracy of the inference based on the SDS-likelihood against the exact (Doob-Gillespie) likelihood, we performed simulation studies under various sets of parameters and size of the susceptible population. We also consider the impact of various truncation times. The data used for parameter inference are generated according to Algorithm 1.
We compare four different inference methods. We list them below:
Method 4 When is large, one can perform a diffusion approximation of the process (2.5) and replace the likelihood in Section 4.1 with the Gaussian likelihood. Our fourth method uses the Gaussian likelihood in Section A.2 for implementing an MCMC scheme. See Section A.2 for more details. As the likelihood of and has the Gaussian form, we assign conjugate normal prior to and with same mean and variance as the gamma priors mentioned in Equation 5.1, i.e.,
The conditional posterior distributions of , and satisfy
However, the conditional posterior distribution of , which is given by
does not assume a simplified form. Note that with independent priors, the posterior distributions of and are independent conditionally on and that the conditional posterior distribution of depends only on the prior parameters and the solution of Equation 2.7. However, in order to draw posterior samples of conditional on and , we need to apply the Metropolis algorithm .
For all MCMC-based methods, we add constraints on the proposed values of in the MCMC iteration steps so that remains within and satisfies Equation 3.3. We have a total of 18 simulation scenarios based on the combinations of the following:
Three values of : , , and yielding the basic reproduction number equal to 4, 2, and 1.5 respectively.
Two cutoff times : One cutoff time is chosen around the half-time of the epidemic duration (at the peak of the infection process) and another one towards the end. Therefore, the chosen values of are 3 and 9 for , 3 and 7 for , and 3 and 6 for . See Figure 3 in Appendix B for the SIR curves for different parameter values and cutoff times. The vertical line in the plot represents the cutoff time.
Three values of the size of the susceptible population : , and .
For each of the 18 scenarios, we generate sets of synthetic epidemic data using Algorithm 1. Each generated data set has rows and two columns. Each row corresponds to an individual in the epidemic and two columns are the individual’s infection and removal times
. In order to ensure the prior distributions in our Bayesian inference are not too informative, we setand for , and . We iterated the MCMC procedures 11,000 times for both Method 3 and Method 4. The first 1,000 iterations are removed as burn-in. After burn-in, every 10th iteration is stored as a posterior sample. In total, 1,000 posterior samples are used for estimation. For Method 2, we generate 1,000 samples without any burn-in phase or thinning because Monte Carlo simulations are sufficient. For the Bayesian methods (i.e., Method 2, Method 3, and Method 4), we estimate the parameters , , and by taking the means of 1,000 posterior samples.
Since is a -valued random variable, assigning as prior is natural. However, in this simulation study, we assign a slightly more informative gamma prior . The reason for this choice is that the conditional posterior of depends only on the solution of Equation 2.7 when we implement the MCMC procedure for Method 4 using a Gaussian likelihood. As a consequence, the estimates for using Method 4 are very poor and with prior , the posterior means are always around , the prior mean. In order to circumvent this limitation of Method 4, we assign a prior, which is slightly more informative. It is important to note that the performance of our Method 3 based on the SDS-likelihood remains unaffected even if an uninformative prior is chosen for . The choice of a prior for is made only to ensure fair comparisons of the different inference methods.
|Method 1||Method 2||Method 3||Method 4||Method 1||Method 2||Method 3||Method 4||Method 3||Method 4|
|Method 1||Method 2||Method 3||Method 4||Method 1||Method 2||Method 3||Method 4||Method 3||Method 4|
For the longer cutoff times, Table 1 provides a summary of the simulation study for the three parameter sets and different initial number of susceptibles . Here, the values of are respectively 9 for , 6 for , and 7 for , that is, in each case the epidemic is almost at its end by time (see Figure 3). The first four columns show the estimates of according to the Methods 1, 2, 3 and 4. Similarly, the next four columns show estimates of . The last two columns are reserved for Method 3 and Method 4 estimates of . Recall that is known exactly for Method 1 and Method 2. The rows of the table are divided into three parts corresponding to the three settings of the parameter values , and . Each of three parts is further segregated into three different classes corresponding to the three different susceptible population sizes , and . Finally, in each cell, we show the average of 100 posterior means and Mean Squared Error (MSE) of parameter estimators. As we can see, the Method 3 based on the SDS-likelihood yields accurate estimates for all three parameters , and even for relatively small values of (see the results for ). As one would expect, the MSEs decrease with increase in across the four different methods. In particular, our Method 3 has a slightly higher variance than the other methods. However, it is important to note that Method 3 does not require knowledge of whereas the other methods do. Most notable is Method 3’s ability to estimate accurately, specially when pitted against the poor performance of Method 4 based on the Gaussian likelihood.
The only difference between results in Tables 2 and 1 are in cutoff times. Whereas in Table 1 we consider data collected for most of the epidemic duration (cutoff is close to the end of the epidemic), in Table 2 we consider data with short cutoff that is close to epidemic peak times. See Figure 3 in Appendix B for a visualization of the SIR curves corresponding to these three parameter settings truncated at by a vertical line. Since the inference is based on the heavily truncated data, the MSEs in Table 2 are expectedly worse than those in Table 1. Also, the sharp decrease in MSEs with increasing in Table 1 is less pronounced in Table 2. Nevertheless, the estimates obtained are still quite accurate. Also, the MSEs for Method 3 are slightly better than those of Method 1 or 2. Interestingly, the parameter is almost always better estimated by Method 3.
Further supplementary numerical results and explanations are provided in Appendix B.
In this paper, we presented a new way of looking at the classical SIR-type epidemic models. Our method addresses all the four problems of the classical SIR model identified in Section 1. Parameter estimation based on the SDS-likelihood (described in Section 4) does not require the effective population size , addressing problem 1. The SDS-likelihood approach, being a direct consequence of the SDS interpretation of the SIR Equation 1.1, provides a principled way of specifying the likelihood from epidemiological field data where the effective population size is unknown but large, addressing problem 2. Although in the current work we do not explicitly illustrate this, it should be clear that the independence of the individuals’ contributions to SDS-likelihood addresses also the problem of aggregation over individuals (problem 3) and over time (problem 4).
It is worth mentioning that, under the SDS-likelihood approach, it typically suffices to have much smaller sample of transition data than other inference methods, such as the exact likelihood or the Gaussian likelihood methods. Due to the asymptotic independence of infection and recovery times of individuals (see Section 3), the SDS-likelihood takes a particularly simple form facilitating a convenient implementation of a suitable MCMC scheme. For ready usage of our method, we have made our code implementation publicly available .
The proposed method can be readily extended to accommodate a wide class of MTMs. The classical SIR model has been chosen here merely as an example to illustrate the ideas underpinning the SDS interpretation of unidirectional MTMs. Indeed, the machinery developed in the present paper goes beyond SIR models, and it can be immediately applied to more general epidemic processes as well as general MTMs arising in physics and chemistry. In particular, we believe the SDS tools can be applied to certain subclasses of Chemical Reaction Networks models in which the individual species molecules can be tracked as they undergo chemical reactions.
In many studies of epidemiological field data, the effective population size is assumed to be very large. For instance, a total population size of was assumed in [2, 18]. Our method is particularly appropriate for such settings. However, since our method hinges on a Law of Large Numbers (see Section 2), the rate of convergence of the scaled processes to the LLN limit, which coincides with the SIR Equation 1.1, is crucial for the quality of inference based on the MTM-likelihood. Therefore, we need to establish a Large Deviations Principle for the scaled processes. This is particularly important for small-scale epidemics. Even though our numerical results are encouraging for values of as small as 100, quantifying the rate of convergence will be useful. We did not consider an LDP in this paper, but we believe standard techniques [11, 16, 39, 13, 14, 38] can be used for this purpose. Another direction of future investigation will be to consider non-mass-action systems and, eventually, non-Markovian systems with non-exponential holding times. Note that the original Sellke construction does not assume Markovianness of the stochastic system. The Markovian version presented in Section 2 has been adapted to our context.
For many epidemiological scenarios, the mass-action assumption is untenable. Several network-based models have been proposed in the recent times [47, 26, 37]. Asymptotic study of those models in the form of various large-graph limits has also been done [9, 21, 30]. Therefore, extending our method to network-based models appears to be a natural next step that we hope to take in the near future.
Appendix A Mathematical background
a.1 Lumpability of a Markov chain
The (strong) lumpability777There is also a notion of weak lumpability in the theory of Markov processes. of a CTMC can be described in terms of lumpability of a linear system of ODEs. Consider the linear system , where is a matrix (representing the transition rate or the infinitesimal generator matrix of the corresponding CTMC on state space ). [Lumpability of a linear system [28, 43]] The linear system is said to be lumpable with respect to a partition of , if there exists an matrix satisfying Dynkin’s criterion (i.e., if for all ). The matrix is often called a lumping of . The following is immediate: If is a lumping of , then there exists an matrix such that .
a.2 Gaussian likelihood
For large and we can replace the exact likelihood Section 4.1 with the approximate Gaussian one because the score processes and are asymptotically () independent and Gaussian. This gives the log-likelihood formula
where are the trajectories of the ODE system Equation 2.7. Note that maximization of also leads to MLEs Equation 4.2. However, maximization with respect to the parameter needs to be done implicitly by adjusting the deterministic trajectory . Essentially, this boils down to maximizing the third term in Section A.2 because the MLEs of and are found by setting the squared terms (the first two terms in Section A.2) to zero. A drawback of the Gaussian likelihood is that the accuracy of the estimate for may be poor, particularly when is not large, because the third term depends only on the solution of the ODE system Equation 2.7 and not on sample data.
Appendix B Additional numerical results
Here, we provide additional numerical results. In particular, we show the posterior plots and crucial diagnostic statistics for the MCMC methods.
The cutoff times are decided based on Figure 3. The idea is to study the impact of censoring on the quality of the inference procedure. Therefore, for each parameter setting, we choose two cutoff times: one near the peak of the epidemic and one near the end of the epidemic. The vertical lines in Figure 3 demarcates the smaller cutoff times for each of the three settings of the parameter values.
In Figures 5 and 4, we show the posterior distributions of the Method 3 estimators of , and based on the SDS-likelihood. To avoid repetition, we show only two posterior plots: one for the parameter setting for the smaller cutoff time case in Figure 4 and one for the parameter setting for the larger cutoff time case in Figure 5. As shown in Tables 1 and 2, the variance of the posterior distributions shrink drastically as we increase from to . We do not show the posterior distributions for the case because it does not provide any additional insights into the quality of the inference procedure except for the fact that the posterior variance further reduces. Finally, we provide additional diagnostic statistics for the MCMC implementation of Method 3 in Figure 6. We show the (thinned) trace of a single Markov chain for two different values of , namely , and . As Figure 6 shows, the chain mixes faster when than when . This is expected because Method 3 is essentially based on an LLN of the scaled Poisson processes keeping track of the population counts. As before, we omit the trace plots corresponding to the case. For completeness, we consider the third parameter setting in Figure 6. To avoid repetition, we do not show trace plots for the other parameter settings. Nevertheless, the Markov chains converge for the other parameter settings as well.