1.1 Introduction
In this chapter we describe how iterated filtering methods (Ionides et al., 2006, 2015) can be used to analyze available infectious disease outbreak data in the form of time series. The centerpiece of these methods is the assumption that the outbreak data can be modeled as a noisy and only partially observed realization of a disease transmission process that is assumed to be a Markov process (King et al., 2016)
. The general inference approach is to (i) formulate a suitable Markovian transmission process; (ii) connect the data to the transmission process via some suitable observation process; (iii) use iterated filtering to perform inference for the model parameters. The inference method presented here is likelihoodbased. It is designed for models where it is relatively easy to draw samples from the Markov process compared to evaluating its transition probabilities. The iterated filtering algorithm is, among others, implemented in the
R package pomp (King et al., 2016)which spans a wide collection of simulation, inference and model selection methods for partially observed Markov processes (POMP). Other simulationbased inference methods for this model class are simulated moments
(Kendall et al., 1999), synthetic likelihood (Wood, 2010), nonlinear forecasting (Sugihara and May, 1990) or Bayesian approaches such as approximate Bayesian computations (Toni et al., 2009; Liu and West, 2001) and particle MCMC (Andrieu et al., 2010). However, at present iterated filtering methods are the only currently available, frequentist, fullinformation, simulationbased methods for POMP models (Ionides et al., 2015).In this chapter we focus on the “simplest” Markovian SIR trasmission model and describe in some detail how to fit this model to outbreak data consisting of the number of newly reported cases aggregated over time intervals, e.g. weeks. However, the methods can be easily extended to more complicated settings, several of which will be discussed here.
The chapter is structured as follows: Section 1.2 gives a short overview about likelihoodbased inference and describes some of its general challenges. In Section 1.3 we introduce the model class of partially observed Markov processes and explain how to formulate and evaluate the likelihood of such models with particle filters. In Section 1.4 we present the iterated filtering algorithm and demonstrate its use by a simple example in Section 1.5. In Section 1.6 we discuss possible extensions of this example and, in Section 1.7, illustrate how the method can be applied to a realworld problem which accommodates most of the complications mentioned. Both examples are accompanied by source code and instructions which can be found online at Stocks (2017). We finish the chapter by outlining advantages and disadvantages of the method presented and point the interested reader to further literature, see Section 1.8.
1.2 Likelihoodbased inference
We start by outlining the key aspects of likelihoodbased inference that will be relevant to us. This is an extensive topic and for readers new to this area we refer to e.g. Pawitan (2001) or Held and Sabanés Bové (2013) for a comprehensive overview on likelihoodbased inference.
1.2.1 The likelihood approach
The idea behind likelihoodbased inference is the following. Suppose we have data in the form of a sequence of observations at times and a model for the data where
is typically a probability mass or density function parametrized by a vector of parameters denoted by
. In our context might, for example, represent the weekly number of newly reported cases over a certain period of time and might contain the parameters of a suitable Markovian epidemic transmission model. In order to calibrate the model to our observations we would like to find the elements of the parameter vector for which our observations are most likely under the chosen probability model. In other words, we would like to maximize this function with respect to evaluated at the data . This translates to optimizing the functionThe function is called the likelihood function and in the following we will suppress its dependence on the data and simply write for convenience. The parameter vector which maximizes this function is called the
maximum likelihood estimate
(MLE) and is given as(1) 
where is the parameter space containing all possible sets of parameters.
For many applications it is often more convenient to work with the loglikelihood function
This transformation often simplifies optimization, but does not change the location of the MLE since the natural logarithm is a monotonically increasing function.
1.2.2 Practical challenges
In principle, the optimization problem in (1) looks rather straightforward, however, there are a number of challenges in practice. Firstly, the evaluation of can be difficult because the function might not be available in closed form. Secondly, even if evaluation is possible, it might be very hard to derive the first and higher order derivatives of analytically or even numerically which are needed for numerical optimization methods, cf. Nocedal and Wright (1999). In this case we need derivativefree optimizers. Those optimizers might impose other problems, for example when the likelihood can only be approximated stochastically, e.g. by Monte Carlo methods (Robert and Casella, 2004)
. In that case standard deterministic derivativefree optimizers fail. All the problems mentioned above occur for the model class at hand because the likelihood is a complex integral. In the following sections we will give more rigorous details about the problem and introduce a method which gets around these challenges. The problems mentioned arise in the specific setting of our model formulation. In addition, there are some other general challenges with likelihoodbased inference in a statistical context. Usually, the point estimate we obtain from maximum likelihood estimation is not very meaningful by itself unless we also quantify the uncertainty of the estimate. One way to solve this problem in a likelihood setting is to construct confidence intervals for the parameters by e.g. calculating the profile loglikelihood for each parameter of interest and invert Wilks’ likelihood ratio test to get the desired intervals
(Wilks, 1938). Another very common challenge is that often there exist multiple local maxima and the optimization algorithm can get stuck in one of these and not return the global maximum. It is therefore important to use a wide range of starting values for the numerical algorithm in order to improve chances that the global maximum is reached. Moreover, it might very well happen that the maximum is not unique because the surface of the likelihood function has ridges. In that case confidence intervals are a good way to quantify the parameter range. In the following, we introduce and formulate the likelihood of partially observed Markov processes and explain how to tackle all of the issues mentioned above.1.3 Inference for partially observed Markov processes (POMP)
In the literature partially observed Markov processes are also known as hidden state space models or stochastic dynamical systems. The main assumption is that at discrete points in time we observe some noisy aspect of the true underlying Markov process which is itself often continuous in time. In the following we formulate the likelihood of a POMP, give an idea why standard methods to find the MLE do not apply here and finally describe how iterated filtering methods overcome these problems. In the following, the exposition and notation is adopted from King et al. (2016) and the materials and tutorials in King (2017).
1.3.1 Likelihood of a POMP
A POMP consists of two model components, (i) an unobserved Markov process , which can be discrete or continuous in time, and (ii) an observation model which describes how the data collected at discrete points in time , is connected to the transmission model. For notational convenience, we write and . In our application, the process describes the dynamics of the disease spread, e.g. in the case of the simple Markovian SIR model counts the number of susceptible, infectious and removed individuals at time . Let
denote the random variable counting the observations at time
which depend on the state of the transmission process at that time, cf. Figure 1.1. Our data, , are then modeled as a realization of this observation process. Depending on how many aspects of the disease dynamics we observe, can be either a univariate or multivariate time series. Assuming that the observable random variable is independent of all other variables given the state of the transmission process , the joint density of the states and the observations is defined as the product of the onestep transmission density, , the observation density, , and the initial density asHence, the likelihood of the parameter vector can be written as the marginal density for a sequence of observations evaluated at the data as
(2) 
The dimension of the integral in (2) depends on the number of compartments of the Markov process and the number of observations, so this integral is usually highdimensional and, except for the simplest situations, cannot be solved analytically. The method we will present in the following sections uses the property that it is fairly easy to draw samples from the density rather than evaluating the function. Because the distribution of the initial state of the Markov process is usually not known, one practical way to deal with this is to fix the initial values of the system at some reasonable values or, treat them as parameters we would like to estimate, see Section 1.6 for further discussion.
1.3.2 Evaluation of the likelihood
Before addressing the question of how to maximize the likelihood in (2) in an efficient way, we have to first think about how we actually evaluate for a given parameter vector . In the special case that the underlying transmission model is deterministic and the initial values are given, is a nonrandom function of for each , hence and are point masses. The likelihood equation (2) then reduces to
(3) 
So given that we know the distribution of our observation model, evaluation of the loglikelihood corresponds to summing over the logarithm of the observation density given the states of the deterministic “Markov process” at each point in time. Maximum likelihood estimation then reduces to a classical numerical optimization problem for a nonlinear function of the parameter vector and for optimization we can use a derivativefree optimizer e.g. the NelderMead method (Nelder and Mead, 1965). However, in general evaluating (and optimizing) the function is unfortunately not as straightforward because the transmission model is not deterministic. Rather, it is a nontrivial problem because of the high dimension of the integral. In the following we describe how to evaluate Equation (2) in a general way with particle filters.
1.3.3 Particle filter
One standard approach to solve a high dimensional integral as in Equation (2) is to approximate the integral by Monte Carlo methods. However, as it will turn out, for the model class at hand, this approach is highly inefficient in practice. Nevertheless, we will demonstrate this approach first, in order to introduce the reader to the problem. Readers unfamiliar with Monte Carlo methods might want to consult the relevant literature, e.g. Robert and Casella (2004); Doucet et al. (2001).
The basic idea is that the likelihood in Equation (2) which can also be written as the integral
(4) 
can be approximated using the Monte Carlo principle as
where is a sample drawn from .
This means that if we generate trajectories by simulating from the Markov process, we just have to evaluate the density of the data given the realizations and then average in order to obtain a Monte Carlo approximation of the likelihood.
It turns out, however, that with this approach the variability in the approximation of the likelihood is very high, because the way of proposing trajectories is entirely unconditional of the data. This means that in practice this method is very inefficient, because a lot of simulations are necessary in order to obtain an estimate of the likelihood that is precise enough to be useful for parameter estimation or model selection. The longer the time series, the worse the problem gets (King, 2017). As it turns out, a better idea is to parametrize the likelihood in (2) as
(5) 
where is the conditional likelihood, given as
(6) 
with the convention that .
In the following we explain how this integral can be efficiently approximated with resampling techniques which are the basis of particle filter methods.
Since
is a Markov chain it follows from the Chapman Kolmogorov equation that
(7)  
From Bayes’ theorem it also follows that
(8) 
where we use the fact that only depends on . The distribution is called the prediction distribution and is called the filtering distribution at time . The idea now works as follows. Assume we have a set of points , in the following referred to as particles, from the filtering distribution at time . Equation (7) then implies that we obtain a sample from the prediction distribution at time
if we just simulate from the Markov model
(9) 
Equation (8) in turn tells us that resampling from with weights proportional to
(10) 
gives us a sample from the filtering distribution at time . By the Monte Carlo principle it follows from (6) that
where is approximately drawn from . In order to obtain the full likelihood we have to iterate through the data, alternating between simulating (9) and resampling (10) in every time step, until we reach so that
(11) 
This method to evaluate the likelihood is called a sequential Monte Carlo algorithm or particle filter (Kitagawa, 1987; Doucet et al., 2001; Arulampalam et al., 2002). The method is implemented as the pfilter function in the pomp package (King et al., 2016) and returns a stochastic estimate of the likelihood which can be shown to be unbiased (Del Moral, 1996).
It can happen that at some time point a very unlikely particle is suggested. In practice, if the conditional likelihood of this particle is below a certain tolerance value, then that particle is considered to be uninformative.
If, at some time point, the conditional likelihood of every particle is below this chosen tolerance a filtering failure occurs. When a failure occurs, resampling is omitted, the conditional likelihood at that time point is set equal to the tolerance and the iteration through the data set continues, hence the time point at which the filtering failure occurred is taken to contain no information. In general, filtering failures are an implication that the model and data might not be consistent (King, 2017).
The variability of the approximation in Equation (11) can be reduced by increasing the number of particles, however, the variability will usually not vanish completely.
This might create problems for standard optimizers which assume that the likelihood is evaluated deterministically. A better choice in this case is to use stochastic optimizers such as the iterated filtering method that we present in the section below.
1.4 Iterated filtering methods
Iterated filtering is a simulationbased method to find the MLE which takes advantage of the structure of POMP models and particle filters. It was first introduced by Ionides et al. (2006) and further improved in Ionides et al. (2015). The iterated filtering method uses the property that it is easy to simulate from while the likelihood is not tractable directly. The basic idea is that a particle filter is applied to a model in which the parameter vector for each particle is following a random walk. As iterations progress, the intensity of the perturbations is successively reduced ("cooling"). It can be shown that the algorithm converges towards the MLE (Ionides et al., 2015). At present this method is the only simulationbased frequentist approach that uses the full information contained in the data. Moreover, iterated filtering methods have been able to solve likelihoodbased inference problems for infectious disease related questions which were computationally intractable for available Bayesian methods (Ionides et al., 2015).
1.4.1 Algorithm
In the following we present the pseudocode for iterated filtering as implemented in the mif2 function in the R package pomp and explain how to draw samples from .
Pseudocode iterated filtering (mif2) cf. Ionides et al. (2015)
Input:
Simulators for and ;
evaluator for ;
data
Algorithmic parameters:
# of iterations ;
# of particles ;
initial parameter swarm ;
perturbation density ;
perturbation scale
Procedure:
1. For in
2. for in
3. for in
4. For in
5. for in
6. for in
7. for in
8. Draw with
9. and for in
10. End For
11. Set for in
12. End For
Output:
Final parameter swarm,
In the algorithm the initial parameter values are perturbed (line 2) by a perturbation density where its standard deviation
is a decreasing function of . The Markov process is initialized (line 3) as a draw from the initial density dependent on the proposed parameter vector. What follows in lines 4  10 is a particle filter as described in the section above with the only difference being that the parameter vector is stochastically perturbed in every iteration through the data. The loop repeats the particle filter with decreasing perturbations and throughout the algorithm the superscripts and denote filtering and prediction distribution, respectively. The algorithm returns the best guess of the parameter swarm after iterations. In the R package pomp the point estimate that the function mif2 returns is the mean of the parameter swarm.To generate realizations from we can use the Gillespie algorithm (Gillespie, 1977). Given the current state of the system, the algorithm simulates the waiting time of the next event and updates the number of individuals in each compartment and the overall time is incremented accordingly. The whole procedure is repeated until a predefined stopping time is reached. In the case of constant per capita transition rates, the simulation of every individual event gives us a complete and detailed history of the process, however, it is usually a very timeconsuming task for systems with large population and state space, because of the enormous number of events that can take place. As a way to speed up such simulations an approximate simulation method can be used, the so called leap algorithm which is based on the Gillespie algorithm (Gillespie, 2001). The leap algorithm holds all rates constant in a small time interval and simulates the numbers of events that will occur in this interval, then updates all state variables, computes the transition rates again and the procedure is repeated until the stopping time is reached (Erhard et al., 2010; King et al., 2015b). Given the total number of jumps, the number of individuals leaving any of the states by any available route during a time interval is then multinomially distributed. Note that the simulation time step is not identical with the time scale the observation process evolves on. Both simulation algorithms are conveniently implemented in the pomp package as the functions gillespie.sim and euler.sim.
1.5 Iterated filtering for SIR model given incidence data
1.5.1 The problem of interest
Typical infectious disease data collected by public health authorities often consists of reported incidences in given time periods. Other important characteristics of an epidemic such as recovery times of the individual or contact network are not observed. The method presented is a useful tool to analyze routinely collected surveillance data because it gives insights into the mechanism of disease spread which is crucial if one wants to e.g. assess the risk of emerging pathogens or evaluate the impact of control measures such as vaccination.
In this section we describe how to carry out inference for the parameters of a POMP where the underlying disease transmission model is assumed to be a simple Markovian SIR model, given that we observe the number of newly reported cases aggregated by week.
We use the algorithm presented in Section 1.4 for a simulated data set and illustrate how the algorithm performs.
In order to keep things simple, in the specific example of this section, we assume that the time of reporting coincides with the time of infection. Of course, this assumption does not hold for all diseases but the
method presented here can be easily adopted to other settings, e.g. when the time of reporting coincides with the time of removal.
The implementation of the following example is made available at Stocks (2017).
1.5.2 Formulation of a POMP model
We will first formulate a Markov transmission model and in a second step relate the data to the transmission model via some observation model which then gives us .
Transmission model
As transmission model we choose a stochastic SIR model among a closed population of individuals where
denotes the number of susceptible, infectious and recovered individuals at time . Individuals are mixing homogeneously and
is the average number of infectious contacts an infectious individual has per time unit. Furthermore, we assume that the time an individual is infectious is exponentially distributed with mean
. We will now fromulate this in a way which makes understanding of the code in Stocks (2017) easier. So let denote a stochastic process which counts the number of individuals which have transitioned from compartment to compartment during the time interval with , where contains all model compartments. The infinitesimal increment probabilities of a jump between compartments fully specify the continuoustime Markov process describing the transmission dynamics. With the notation above, counts the number of individuals changing compartment in an infinitesimal time interval . Thus,(12) 
Moreover, the state variables and the transition probabilities (12) are related in the following way:
(13)  
Observation model We now relate the transmission model to our observations of the weekly number of newly reported cases. With the notation above the true number of newly infected individuals accumulated in each observation time period is given as
(14) 
To incorporate the count nature of the observations a natural assumption is to model the actual reported cases as realizations of a Poissondistributed random variable with a given timedependent mean. The number of recorded cases
within a given reporting interval is then(15) 
This distribution then corresponds to which is easy to evaluate. One interpretation of this choice of observation noise is to account for uncertainty in classification of cases, including false positives.
1.5.3 Inference
In the following we perform inference for the two epidemiological parameters and on a set of simulated data from the model where we apply the iterated filtering algorithm presented in Section 1.4.
Data We consider a closed population with individuals of whom one is initially infectious and the rest are susceptible. A realization from the model presented above with and for weeks where for was simulated. Figure 1.2 shows the number of weekly reported cases where it is assumed that all cases are reported.
Implementation details We use the iterated filtering algorithm as implemented in the function mif2 in the R package pomp to infer the epidemic parameters and assuming the initial values and total population size were known. We run the algorithm for iterations and particles. As starting values for the fitting algorithm we use 10 parameter constellations drawn uniformly from a hypercube which contains biologically meaningful parameter values for and . With this procedure we address the potential problem of local maxima in the optimization. In general, if all searches converge to the same MLE for starting values drawn at random from a hypercube this increases the chances that a global maximum has been found (King, 2017). For the perturbation of the parameters we choose a random walk with an initial standard derivation of for both parameters. As cooling scheme we use geometric cooling which means that on the nth mif2 iteration, the relative perturbation intensity is where we chose
. That is, after 50 iterations the perturbation is only half of the intensity compared to the first iteration. In each iteration the variance of the random walk is successively reduced so the loglikelihood of the perturbed model gradually approaches the loglikelihood of the model we are interested in. However, after a finite number of iteration steps, the loglikelihoods of the two models, the one with and the one without perturbed parameters, are not identical. To get around this issue a particle filter evaluation of the
mif2 model output using Equation (11) is necessary. Hence, for each of the 10 mif2outputs we obtain we run 10 particle filters, each with 1000 particles. From the multiple particle filter evaluation we can then calculate the average loglikelihood and the standard error of the Monte Carlo approximation for every parameter set. Consequently, we choose the parameter constellation of the 10 possible with the highest average loglikelihood as the maximum likelihood estimate. In order to quantify the associated uncertainties of the parameter estimates based on our observations we calculate the 95 % confidence intervals for each parameter. For this we construct the profile likelihood of each parameter and apply the Monte Carlo adjusted profile (MCAP) algorithm
(Ionides et al., 2017). This recently developed methodology accounts for the presence of Monte Carlo error in the likelihood evaluation and adjusts the width of the confidence interval accordingly. This procedure might seem overly complicated at first sight because, intuitively, the parameter swarm we obtain as an output from the iterated filtering algorithm should contain some measure of uncertainty. However, due to particle depletion, which is the situation when only very few different particles have significant weight (Doucet and Johansen, 2011), the information about the local shape of the likelihood surface contained in the particles is not very reliable. The profile likelihood is much more robust in this case since it relies on multiple independent mif2 searches (King, 2017).Results The inference results are visualized in Figures 1.3 and 1.4. As shown in Figure 1.3 the randomly drawn starting values for both parameters are converging towards the true values at the same time as the likelihood increases and the number of filtering failures disappears. Both 95% confidence intervals constructed from the profile loglikelihoods cover the true parameters as shown in Figure 1.4.
1.5.4 Practical advice
In the R package pomp a partially observed Markov process together with observations in the form of a uni or multivariate time series can be represented via a so called pomp object. The whole package’s functionality such as simulation and inference is centered around this object. Depending on which functions one would like to use, the model can be implemented by specifying some or all model components. In order to help identify possible error sources that can occur while applying the algorithm we now describe a few practical hints.
The first step after constructing the pomp object is to simulate realizations from the model in order to check that the code is working and the model gives the expected output. A good idea at this stage is to also construct the corresponding deterministic skeleton of the stochastic Markov transmission model in order to have a simplified version of the model and to also generate trajectories and compare if the mean of the stochastic model corresponds to the deterministic version. Another benefit of simulating from the model in general is that it helps to identify a range of likely parameters which generate realizations that resemble the data.
Before the data comes into play and the actual fitting starts we find it very helpful to first test the algorithm with simulated data. For this a realization of the model is generated, then the known set of parameters which was used to generate the realization is evaluated with the particle filter. This way we make sure that the core function in the iterated filtering method is working.
Then the algorithm can be applied to the simulated set and it should give back the values used for simulation. This procedure also helps understanding how the parameter estimators are correlated and if they are identifiable.
Now, let us move to the data.
As a last step before applying the iterated filtering algorithm to our observations we recommend to fit the data first to the simpler, deterministic model in order to get a feeling for how the parameters behave and identify possible problems with the likelihood surface etc.
Then, when the mif2 is finally used it makes sense to choose likely values as starting values for the parameters we want to estimate first. However, in a last step a global search should be carried out which means that random starting values are chosen and, ideally, then all searches converge to the same MLE regardless of their starting values. In practice, the likelihood is often ridged so some searches can get stuck in local maxima. That is why it is important to use many initial values from many different starting values. Another common problem in practice is that parameter values are suggested which are very unlikely which in turn leads to filtering failures. In that case it can be useful to add stochasticity to the model which accounts for model misspecifications or unmodelled disease characteristics.
Working with the mif2 algorithm can be a time consuming task because the method is simulationbased. This is the price we pay for not needing to be able to evaluate the density function of the Markov model directly. That is why we also want to give some hints of how to speed up the procedure to some extent. The first idea is to parallelize the code so that multiple searches can be carried out at the same time. This, in turn, poses some challenges in preventing repetition of expensive computations and trying to ensure reproducibility and – we present one way of doing this in the manual Stocks (2017), more can be found in the tutorials at King (2017).
In order to ensure that the information about the likelihood surface is
continually improved it is useful to keep a record of all parameter vectors estimated with the mif2 function, together with the computed likelihood for each parameter set,
e.g. by saving the obtained results in a CSV file (King, 2017).
Moreover, it is sometimes worth accepting approximation errors by choosing the simulation time step in the leaping algorithm to be not too small, which can lead to significant gains in computational speed; for a short discussion on numerical solutions based on discretizations in the context of POMP models cf. Bretó et al. (2009).
1.6 Extensions
In the previous section we used a simple model to explain the concept of iterated filtering in order to facilitate understanding. However, reality is often way more complicated than this toy example. In the following we discuss some important complications of infectious disease data and how they can be included in POMP models.
Underreporting Underreporting is a common problem appearing in surveillance data and is usually the more pronounced the less severe the disease is. It arises for different reasons e.g. it can result from asymptomatic cases, cases where individuals do not consult the doctor, cases that are not identified as such (misdiagnosis) or cases which just get lost in the reporting system (Gibbons et al., 2014). One way to include underreporting in a POMP model is to change the observation model to
where is the reporting rate which then also can be estimated.
Seasonality
For many diseases such as childhood diseases or vector borne diseases the transmission rate is not constant but varies through time. This might be due to social aggregations of the host such as in daycare institutions and schools which are closed during summer, changes in environment that influence the biology of vector populations or changes in weather such as temperature or precipitation. One way to introduce this complexity into our models is through seasonal forcing of the transmission rate so that the number of infectious contacts changes through time as . Following Keeling and Rohani (2008), one possible choice is
(16) 
where is the amplitude of the forcing, is the period of the forcing and is the phase shift parameter.
With this choice of forcing function the parameter can be interpreted as the average transmission rate of an individual which varies between and during the forcing period.
A more flexible way to choose the forcing function is through splines. If , is a periodic Bspline basis, then
we can for example define
Variation of the coefficients then gives a wide variety of shapes for . For the reader new to the theory of splines we recommend Stoer and Bulirsch (2002). A comprehensive overview of the problems of seasonal forcing can be found in Keeling and Rohani (2008).
Overdispersion A common phenomenon in calibration of models from data is the presence of greater variability in the set of observations than would be expected under the assumed model, cf. e.g. McCullagh and Nelder (1989); Bretó et al. (2009)
. Reasons for this include heterogeneities in the population, model misspecifications or unmodelled processes which we either cannot quantify or are very hard to measure in any way. This phenomenon is called overdispersion and we can account for it in two different ways. On the one hand, we can introduce overdispersion into the observation model by changing the observation distribution from the Poisson distribution to a distribution where the variance can be adjusted separately from the mean. In the case of count data, a natural distribution would be the negative binomial distribution, so
(17) 
with being the true number of accumulated incidences per time unit and with denotes the negative binomial distribution with mean and variance
. Another choice could be the truncated normal distribution where the results are rounded to integers.
However, in outbreak data, we often see fluctuations which are clearly not only due to variation in data collection, but due to phenomena not captured by the model. In the toy example in Section 1.5, we have accounted for stochasticity in the underlying system by assuming that individuals move between classes at random times. However, for large population sizes the stochastic system approaches the deterministic system and, hence, the role of randomness decreases as the population size increases; for details cf. e.g. Fuchs (2013)
. One way to account for overdispersion in the transmission model is to stochastically perturb the transmission rate by multiplying a continuoustime white noise process
which fluctuates around the value one with the transmission rate. In Bretó et al. (2009) it is shown that by choosing the corresponding integrated noise process in a way such that its increments are independent, stationary, nonnegative and unbiased the Markov property is retained. One suitable example for such a process is a Lévy process withand where denotes the shape and the ageindependent scale parameter with corresponding mean and variance . The parameter is called the infinitesimal variance parameter (Karlin and Taylor, 1981). We can include this into the model by letting the transmission rate be
(18) 
More details about this kind of overdispersion can be found in Bretó et al. (2009); Bretó and Ionides (2011).
Structured populations To make mathematical models for disease spread more realistic different kinds of host heterogeneities can be introduced (Keeling and Rohani, 2008). For example age structure, spatial heterogeneity or varying risk structures of the individuals might play an important role in disease transmission and can be easily accommodated in a POMP.
Covariates It can be interesting to investigate what impact a vectorvalued covariate process has on the disease dynamics.
That might e.g. be environmental covariates such as temperature, precipitation, or demographic information such as numbers of births and deaths over time. Inclusion of such a covariate vector is unproblematic in a POMP setting because the densities ,
and can depend on the observed process (King et al., 2016). If one is e.g. interested in investigating the influence of a specific covariate on the transmission rate one could check the model’s capability to fit the data with and without this covariate and compare the results.
Initial values
In our toy example of Section 1.5 we assumed that the distribution of the initial values of the Markov process, , is known. Besides some special cases that could arise e.g. in some experimental situations, this distribution is, however, not known or observable. If the disease we observe is endemic in the population and does not vary significant over time, it might make sense to assume that the initial values originate from the stationary distribution of the system. If this is not the case we can treat the value of as a parameter to be estimated. In this case, is a point mass, the
concentration of which depends on (King et al., 2016).
Missing data Missing values in outbreak data are a common complication. When using the R package pomp, the pomp
object can handle missing values if the observation model probability density function is implemented so as to deal with them appropriately. One way is to construct the observation density so it returns e.g. a loglikelihood of
for the missing data entries, for implementational details c.f. the FAQ in King (2017).1.7 Rotavirus example
After presenting a simple toy example as a proof of concept, we now give a short illustration of a realworld problem which accommodates some of the complications presented in the previous section and was analyzed with iterated filtering. A detailed description of the model and inference results can be found in Stocks et al. (2018).
The data which were analyzed in that paper are the weekly reported number of new laboratoryconfirmed rotavirus cases among children, adults and elderly from 2001 until 2008 in Germany, scaled up by an underreporting rate (cf. Section 1.6) as inferred and described in more detail in Weidemann et al. (2013). Surveillance of rotavirus infection is relevant to public health authorities because it is the primary cause for severe gastroenteritis in infants and young children and causes significant morbidity, mortality and
financial burdens worldwide (Atkins et al., 2012). Individuals can get the disease more than once but the highest reported incidence is observed in children under the age of 5 and rises again later in life. Parameter estimation from surveillance data is crucial if one wants to e.g. assess the effect of vaccination campaigns or other public health control
measures.
In Stocks et al. (2018) the disease transmission was modeled as an agestratified SIRS Markov process with overdispersion in the observation (cf. Equation (17)) as well as in the transmission model (cf. Equation (18)).
As rotavirus in Germany varies seasonally peaking around March, a forcing function (cf. Equation (1.7)) was used.
Here, we will present a simplified version of the model which ignores agestructure and perform inference for one realization of this model; the simulated case report data (black line) is shown in Figure 1.5. The initial values were fixed at the stationary distribution of the system and the waning of immunity rate, , was also assumed to be fixed and known. With the notation as in Section
1.5, this translates mathematically into
(19)  
where denotes the birth and death rate which was assumed to be constant and known. Moreover, counts the number of births and counts the number of deaths with in the respective compartment up until time . The seasonalforced transmission rate was defined as
where is defined as in Equation (18). The transition probabilities from (19) are related to the state variables in the following way:
With this notation the true number of newly infected individuals accumulated in each observation time period , is given as
The number of recorded cases within this time period was modeled as
compare Equation (17).
In this example we inferred the susceptibility parameter , the seasonal forcing parameters and and the overdispersion parameters and , hence . A diagnostic plot similar to the one in Figure 1.3 (available online) shows that the iterated filtering algorithm performs well as the parameter estimates converge towards the parameters used for simulation.
Figure 1.5 shows the model evaluated at the MLE obtained from the iterated filtering. Further implementational details for this example are available at Stocks (2017).
The aim of the analysis in Stocks et al. (2018) was to compare the impact of different kinds of variabilities in the transmission as well as the observation model. The paper concluded that a model which accounts for overdispersion in both model components (dark and light shading in Figure 1.5) is best suited to explain the analyzed data.
The strength of using iterated filtering in this context was that the method allowed for efficient maximum likelihood estimation for a POMP where the underlying transmission model was stochastic. Hence, it facilitated model comparison with respect to different choices of variability in both model components.
1.8 Concluding comments
1.8.1 Pros and cons of iterated filtering methods for outbreak data
The method we have described has many positive aspects. First of all, as illustrated in this chapter, the method is highly flexible and can be easily adapted to many different situations of interest.
The great strength of the method is that one only needs to simulate from the transmission model if the complete likelihood is not directly tractable. Facilitated by the R package pomp the method is very straightforward to implement and can be easily adapted for a wide range of settings. The package is optimized for computational efficiency, in particular the simulation functions (such as euler.sim or gillespie.sim) facilitate implementation considerably and give rise to substantial gains in speed by accessing C code. Secondly, the method facilitates dealing with unobserved variables and has no problem with the fact that transmission is continuous in time, while observations are made at discrete points in time. Furthermore, it is easy to combine transmission noise and observation noise as long as the Markov property is retained.
The method also allows the inclusion of covariates in mechanistically reasonable ways.
As mentioned earlier, to date this is the only simulationbased frequentist approach for POMP models that does not rely on summary statistics and can perform efficient inference for POMP models in situations where available Bayesian methods cannot (Ionides et al., 2015).
However, there is one main difficulty with the method. Since it is purely simulationbased it comes at the cost of being computationally very expensive in particular when the time series becomes long and the state space increases. In Section 1.5.4
we gave some practical hints of how to mitigate the problem. Moreover, the method requires the transmission model to be Markovian which can be problematic because in practice the stages of diseases are rarely welldescribed by exponential distributions. An easy way to mitigate the problem is by subdividing the respective compartments so that the waiting times individuals stay in each compartment change from being exponentially distributed to gamma distributed, i.e. having a more pronounced mode. Sometimes it is also possible to formulate nonMarkovian processes as alternate Markov processes with additional state variables (e.g., if
depends on both and , i.e., is not Markov, one could define a new, bivariate Markov process . However, these strategies increase the dimension of the unobserved state space of the transmission model which can result in an increase of Monte Carlo noise in the particle filter. Hence, those strategies are only effective up to the point where it becomes computationally too expensive to handle this increase of noise.1.8.2 Further reading
For the reader interested in more details about the theoretical background of iterated filtering methods we recommend Ionides et al. (2006) and Ionides et al. (2015) for further reading. For the reader interested in learning how to work with the R package pomp we recommend the study of King et al. (2016) and to follow some of the very comprehensive online tutorials available through King (2017). The package has already been applied to answer many different epidemiological questions for different infectious diseases, e.g. for cholera (King et al., 2008), measles (He et al., 2010), malaria (Bhadra et al., 2011), polio (MartinezBakker et al., 2015), ebola (King et al., 2015a), and rotavirus (Martinez et al., 2016; Stocks et al., 2018). Finally, iterated filtering has facilitated the development and analysis of a divers array of mechanistic models, a recent and comprehensive review of which can be found in Bretó (2017).
Acknowledgments
I would like to thank Aaron A. King for making the tutorials and, especially, the codes in King (2017) available. Furthermore, I would like to thank Carles Bretó and Philip O’Neill for their constructive comments and suggestions. TS was supported by the Swedish research council, grant number 2015_05182_VR.
References
 Andrieu et al. (2010) Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B, 72(3):269–342.
 Arulampalam et al. (2002) Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T. (2002). A tutorial on particle filters for online nonlinear, nonGaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2):174–188.
 Atkins et al. (2012) Atkins, K. E., Shim, E., Pitzer, V. E., and Galvani, A. P. (2012). Impact of rotavirus vaccination on epidemiological dynamics in England and Wales. Vaccine, 30(3):552–564.
 Bhadra et al. (2011) Bhadra, A., Ionides, E. L., Laneri, K., Pascual, M., Bouma, M., and Dhiman, R. (2011). Malaria in northwest India: Data analysis via partially observed stochastic differential equation models driven by Lévy noise. Journal of the American Statistical Association, 106(494):440–451.
 Bretó (2017) Bretó, C. (2017). Modeling and inference for infectious disease dynamics: a likelihoodbased approach. Statistical Science, (In press, http://www.imstat.org/sts/STS636.pdf).
 Bretó et al. (2009) Bretó, C., He, D., Ionides, E. L., and King, A. A. (2009). Time series analysis via mechanistic models. The Annals of Applied Statistics, 3(1):319–348.
 Bretó and Ionides (2011) Bretó, C. and Ionides, E. L. (2011). Compound Markov counting processes and their applications to modeling infinitesimally overdispersed systems. Stochastic Processes and Their Applications, 121(11):2571–2591.
 Del Moral (1996) Del Moral, P. (1996). Non linear filtering: Interacting particle solution. Markov Processes and Related Fields, 2(4):555–580.
 Doucet et al. (2001) Doucet, A., de Freitas, N., and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. SpringerVerlag, New York.

Doucet and Johansen (2011)
Doucet, A. and Johansen, A. M. (2011).
A tutorial on particle filtering and smoothing: fifteen years later.
Oxford Handbook of nonlinear filtering
.  Erhard et al. (2010) Erhard, F., Friedel, C. C., and Zimmer, R. (2010). FERN Stochastic Simulation and Evaluation of Reaction Networks. Systems Biology for Signaling Networks.
 Fuchs (2013) Fuchs, C. (2013). Inference for Diffusion Processes With Applications in Life Sciences. SpringerVerlag Berlin Heidelberg.
 Gibbons et al. (2014) Gibbons, C. L., Mangen, M. J., Plass, D., Havelaar, A. H., Brooke, R. J., Kramarz, P., Peterson, K. L., Stuurman, A. L., Cassini, A., Fèvre, E. M., and Kretzschmar, M. E. (2014). Measuring underreporting and underascertainment in infectious disease datasets: a comparison of methods. BMC Public Health, 14(147).
 Gillespie (1977) Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25):2340–2361.
 Gillespie (2001) Gillespie, D. T. (2001). Approximate accelerated stochastic simulation of chemically reacting systems. Journal of Chemical Physics, 115(4):1716–1733.
 He et al. (2010) He, D., Ionides, E. L., and King, A. A. (2010). Plugandplay inference for disease dynamics: Measles in large and small populations as a case study. Journal of the Royal Society Interface, 7(43):271–283.
 Held and Sabanés Bové (2013) Held, L. and Sabanés Bové, D. (2013). Applied Statistical Inference. Springer.
 Ionides et al. (2006) Ionides, E. L., Bretó, C., and King, A. A. (2006). Inference for nonlinear dynamical systems. PNAS, 103(49):18438–18443.
 Ionides et al. (2017) Ionides, E. L., Bretó, C., Park, J., Smith, R. A., and King, A. A. (2017). Monte Carlo profile confidence intervals for dynamic systems. Journal of The Royal Society Interface, 14(132).
 Ionides et al. (2015) Ionides, E. L., Nguyen, D., Atchadé, Y., Stoev, S., and King, A. (2015). Inference for dynamic and latent variable models via iterated, perturbed Bayes maps. PNAS, 112(3):719–724.
 Karlin and Taylor (1981) Karlin, S. and Taylor, H. M. (1981). A Second Course in Stochastic Processes. Academic Press.
 Keeling and Rohani (2008) Keeling, M. J. and Rohani, P. (2008). Modeling Infectious Diseases in Humans and Animals. Princeton University Press.
 Kendall et al. (1999) Kendall, B. E., Briggs, C. J., Murdoch, W. W., Turchin, P., Ellner , S. P., McCauley, E., Nisbet, R. M., and Wood, S. N. (1999). Why do populations cycle? A synthesis of statistical and mechanistic modeling approaches. Ecology, 80(6):1789–1805.
 King (2017) King, A. A. (2017). Project pomp on github. http://kingaa.github.io/pomp/, [Accessed September 15, 2017].
 King et al. (2015a) King, A. A., Domenech de Cellès, M., Magpantay, F. M. G., and Rohani, P. (2015a). Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola. Proceedings of the Royal Society of London B, 282:20150347.
 King et al. (2015b) King, A. A., Ionides, E. L., Bretó, C., Ellner, S., Kendall, B., Ferrari, M., Lavone, M., and Reuman, D. (2015b). Introduction to pomp: Inference for partiallyobserved Markov processes. R vignette.
 King et al. (2008) King, A. A., Ionides, E. L., Pascual, M., and Bouma, M. J. (2008). Inapparent infections and cholera dynamics. Nature, 454:877–888.
 King et al. (2016) King, A. A., Nguyen, D., and Ionides, E. L. (2016). Statistical inference for partially observed Markov processes via the R package pomp. Journal of Statistical Software, 69(12):1–43.
 Kitagawa (1987) Kitagawa, G. (1987). Nongaussian statespace modeling of nonstationary time series. Journal of the American Statistical Association, 82(400):1032–1041.
 Liu and West (2001) Liu, J. and West, M. (2001). Combinig parameter and state estimation in simulationbased filtering. Sequential Monte Carlo Methods in Practice, pages 197–223.
 Martinez et al. (2016) Martinez, P. P., King, A. A., Yunusd, M., Faruqued, A. S. G., and Pascual, M. (2016). Differential and enhanced response to climate forcing in diarrheal disease due to rotavirus across a megacity of the developing world. PNAS, 113(15):4092–4097.
 MartinezBakker et al. (2015) MartinezBakker, M., King, A. A., and Rohani, P. (2015). Unraveling the transmission ecology of polio. PLOS Biology, 13(6):e1002172.
 McCullagh and Nelder (1989) McCullagh, R. and Nelder, J. (1989). Generalized Linear Models. 2nd ed. Chapman and Hall, London.
 Nelder and Mead (1965) Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4):308–313.
 Nocedal and Wright (1999) Nocedal, J. and Wright, S. J., editors (1999). Numerical Optimization. Springer Series in Operations Research and Financial Engineering.
 Pawitan (2001) Pawitan, Y. (2001). In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press.
 Robert and Casella (2004) Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods. SpringerVerlag.
 Stocks (2017) Stocks, T. (2017). Chapter on iterated filtering methods on github. https://github.com/theresasophia/chapter_IFmethods, [Accessed October 27, 2018].
 Stocks et al. (2018) Stocks, T., Britton, T., and Höhle, M. (2018). Model selection and parameter estimation for dynamic epidemic models via iterated filtering: application to rotavirus in Germany. Biostatistics, page kxy057.
 Stoer and Bulirsch (2002) Stoer, J. and Bulirsch, R. (2002). Introduction to Numerical Analysis. Springer.
 Sugihara and May (1990) Sugihara, G. and May, R. M. (1990). Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature, 344(6268):734–741.
 Toni et al. (2009) Toni, T., Welch, D., Strelkowa, N., Ipsen, A., and Stumpf, M. P. (2009). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Society Interface, 6(31).
 Weidemann et al. (2013) Weidemann, F., Dehnert, M., Koch, J., Wichmann, O., and Höhle, M. (2013). Bayesian parameter inference for dynamic infectious disease modeling: rotavirus in Germany. Statistics in Medicine, 33(9):1580–1599.
 Wilks (1938) Wilks, S. S. (1938). The largesample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics, 1(9):60–62.
 Wood (2010) Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102–1104.
Comments
There are no comments yet.