Island filters for partially observed spatiotemporal systems

02/12/2020 ∙ by Edward L. Ionides, et al. ∙ University of Michigan 0

Statistical inference for high-dimensional partially observed, nonlinear, stochastic processes is a methodological challenge with applications including spatiotemporal analysis of epidemiological and ecological systems. Standard particle filter algorithms, which provide an effective approach for general low-dimensional partially observed Markov processes, suffer from a curse of dimensionality that limits their applicability beyond low-dimensional systems. We show that many independent Monte Carlo calculations, none of which, by itself, attempts to solve the filtering problem, can be combined to give a global filtering solution that theoretically beats the curse of dimensionality under weak coupling conditions. The independent Monte Carlo calculations are called islands, and the corresponding algorithm is called a basic island filter (BIF). Carrying out an operation called adapted simulation on each island results in a variant called an adapted simulation island filter (ASIF). Adapted simulation can be implemented using a Monte Carlo technique called intermediate resampling to give the ASIF-IR algorithm which has improved theoretical and empirical scaling. Our focus is on evaluation of the likelihood function, but our algorithms and theory are also pertinent to latent state estimation. We demonstrate our methodology on coupled population dynamics arising in the epidemiology of infectious disease. In this context, our methods retain the generality of particle filter algorithms while scaling to systems an order of magnitude larger.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 40

page 42

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Particle filter (PF) algorithms are a fundamental tool for inference on nonlinear partially observed dynamic systems (Doucet et al.; 2001; Doucet and Johansen; 2011; Bretó et al.; 2009; Ionides et al.; 2015). Standard PF methods suffer from a curse of dimensionality (COD), defined as an exponential increase in computational demands as the problem size grows Bengtsson et al. (2008); Snyder et al. (2015); Rebeschini and van Handel (2015). This limits the applicability of PF-based statistical methodology for high-dimensional systems which arise in spatiotemporal data analysis and other situations. In PF algorithms, particles are representations of latent variables. Groups of particles which have limited interaction within the algorithm are called islands. We say an independent island particle filter is a PF algorithm that constructs groups of non-interacting particles and then combines the results. The independence is understood in a Monte Carlo sense, with each island having access to the entire dataset. This geographic island metaphor describing the Monte Carlo computation is separate from any geographic structure that may, or may not, be present in a spatiotemporal model. A previous motivation for dividing particles into weakly interacting islands has been to minimize communication for parallelized PF algorithms (Vergé et al.; 2013; Del Moral et al.; 2017). We have a different motivation: to show that many independent PF calculations, each one of which is too weak to be useful by itself, can be combined to solve high-dimensional problems not amenable to solution by a single filter. Indeed, the calculation carried out on each island will not even attempt to directly solve the filtering problem. The task of each island is to provide a representation of the latent process that provides some relevant information at some points in space and time. The algorithms then proceed to combine these islands, assessing which islands are good representations at a given spatiotemporal location.

Independent island filters may be motivated by the empirical success of poor man’s ensemble forecasting methodology in which a collection of independently constructed forecasts is generated using different models and methods (Ebert; 2001). Poor man’s ensembles have sometimes been found to have greater forecasting skill than any one forecast (Leutbecher and Palmer; 2008; Palmer; 2002). One explanation for this phenomenon is that even a hypothetically perfect model cannot filter and forecast well using methodology afflicted by the COD. We show that independent island methodology can relieve this limitation. From this perspective, the independence of the forecasts in the poor man’s ensemble, rather than the diversity of model structures, may be the key to its success.

Our idea is most clearly seen in a relatively simple algorithm which we call the basic island filter (BIF). Each island in BIF simply simulates a realization of the latent process model. While formally beating the COD under a weak mixing assumption, BIF could have poor numerical behavior if a very large number of islands were needed to reach this happy asymptotic limit. Our empirical results show that BIF may indeed be a useful algorithm in some situations, however it also provides a route toward two other algorithms which we call the adapted sampling island filter (ASIF) and adapted sampling island filter with intermediate resampling (ASIF-IR). ASIF and ASIF-IR carry out some resampling on each island and therefore enable each island to track the data, in a weak sense, while resisting the COD.

Notation

To set up notation, we suppose the latent process is comprised of a collection of units, labeled which we write as . An unobserved, latent Markov process at a discrete sequence of observation times is written as , with taking values in . The initial value may be stochastic or deterministic. Observations are made on each unit, modeled by an observable process which takes values in . Observations are modeled as being conditionally independent given the latent process. The conditional independence of measurements applies over both space and time, so is conditionally independent of given whenever . This unit structure for the observation process is not necessary for all that follows (see Sec. S1). We suppose the existence of a joint density of and with respect to some appropriate measure. We use the same subscript notation to denote other marginal and conditional densities. The data are for spatial unit at time . For a fixed value of , this model fits into the partially observed Markov process (POMP) framework (Bretó et al.; 2009)

, also known as a state space model or hidden Markov model. Spatiotemporal POMP models are characterized by a high dimensionality, proportional to

, for the latent Markov process. We use the term SpatPOMP to describe POMP models indexed by as well as . In accordance with the spatial interpretation of the units , we may suppose that units sufficiently distant in some sense approach independence, a phenomenon called weak coupling. Inference techniques for SpatPOMPs may be relevant to high-dimensional stochastic dynamic systems where does not index spatial locations. The units will sometimes be considered in sequence within each time point, and so we write the set of points preceding by

(1)

An ordering of the spatial locations, required by [1], might seem artificial. Indeed, densities such as could potentially be hard to compute or simulate from. The algorithms we study do not carry out such computations, but only evaluate the measurement model on these neighborhoods. To represent weak coupling, we suppose there is a neighborhood such that the latent process on is approximately conditionally independent of given data on .

Island filter methodology

The inputs for the BIF, ASIF and ASIF-IR are listed in Table 1. All three algorithms are plug-and-play

, meaning that they require as input a simulator for the latent dynamic process but not an evaluator of transition probabilities

(Bretó et al.; 2009; He et al.; 2010). The primary output under consideration is an estimate of the log likelihood for the data given the model. The log likelihood

is a fundamental quantity for both Bayesian and non-Bayesian statistical inference. Monte Carlo estimates of the log likelihood underpin modern methods for full-information statisical inference on partially observed dynamic systems

(Andrieu et al.; 2010; Ionides et al.; 2015, 2017). Filtering theory ties together the three tasks of likelihood evaluation, forecasting, and inferring latent dynamic variables Crisan and Rozovskii (2011). Successful likelihood evaluation is therefore closely related to success at forecasting and latent process reconstruction.

The pseudocodes for the algorithms that follow will adopt a convention that implicit loops are carried out over all free indices. For example, the construction of in BIF has an implicit loop over , and . The construction of in ASIF has an implit loop over , and but not over since is already defined in an explicit loop and is therefore not a free index. Table 1 lists the ranges of the implicit loops for all following algorithms.

Table 1. Island filter inputs, outputs and implicit loops.
input:
simulator for and
evaluator for
number of islands,
neighborhood structure,
data,
ASIF and ASIF-IR: particles per island,
ASIF-IR: number of intermediate timesteps,

ASIF-IR: measurement variance parameterizations,

and
ASIF-IR: approximate process and observation mean functions, and
output:
Log likelihood estimate,
implicit loops:
, , ,

The BIF algorithm that follows amounts to an island filter with a single particle per island, with no weighting or resampling within each island.

BIF. Basic island filter.
Simulate
Measurement weights,
Prediction weights,

Theorem 1 shows that BIF can beat COD. However, an algorithm such as BIF, which doesn’t carry out resampling, would require quick spatiotemporal mixing for the simulated particles to remain relevant for predictions throughout a long time series. It may in practice be necessary to select simulations consistent with the data, much as standard PF algorithms do. We look for approaches that build on the basic insight of BIF while having superior practical performance.

Whereas the full global filtering problem may be intractable via importance sampling methods, a version of this problem local in space and time may nevertheless be feasible. Drawing a sample from the conditional density, , is distinct from the filtering problem: we call this operation adapted simulation. For models where is close to , adapted simulation requires a local calculation that may be much easier than the full filtering calculation. The following adapted simulation island filter (ASIF) is constructed under a hypothesis that the adapted simulation problem is tractable. The adapted simulations are then reweighted in a neighborhood of each point in space and time to construct a local approximation to the filtering problem which leads to an estimate of the likelihood.

ASIF. Adapted simulation island filter.
Initialize adapted simulation:
For
    Proposals:
    Measurement weights:
    Adapted resampling weights:
    Resampling:
    
    
End for

In addition, we consider an algorithm which adds intermediate resampling to the adapted simulation islands, and is therefore termed ASIF-IR. Intermediate resampling involves assessing the satisfactory progress of particles toward the subsequent observation at a collection of times between observations. This is well defined when the latent system is a continuous time process with obsevation times such that . For a continuous time latent process model, intermediate resampling can have favorable scaling properties when the intermediate times

scale with Park and Ionides (2019). In the case , ASIF-IR reduces to ASIF. Intermediate resampling was developed in the context of sequential Monte Carlo Del Moral and Murray (2015); Park and Ionides (2019); however, the same theory and methodology can be applied to the simpler and easier problem of adapted simulation. ASIF-IR requires a guide function which assess the capability of each particle to explain future data. The implementation in this pseudocode constructs the guide used by the guided intermediate resampling filter Park and Ionides (2019) which generalizes the popular auxiliary particle filter (Pitt and Shepard; 1999). The guide function affects numerical performance of the algorithm but not its correctness: it enables a computationally convenient approximation to improve performance on the intractable target problem. Our guide function supposes the availability of a deterministic function approximating evolution of the mean of the latent process, and uses simulations to assess variability around this mean. We write this as

Further, the guide requires that the measurement model has known conditional mean and variance as a function of the model parameter vector

, written as

Also required for ASIF-IR is an inverse function such that

This guide function is applicable to spatiotemporal versions of a broad range of population and compartment models used to model dynamic systems in ecology, epidemiology, and elsewhere. Other guide functions could be developed and inserted into the ASIF-IR algorithm. The number of particles for the guide simulations in ASIF-IR does not necessarily have to be set equal to the number of intermediate proposals. We make that simplification here since there is no compelling reason against it.

ASIF-IR. Adapted simulation island filter with
             intermediate resampling.
Initialize adapted simulation:
For
    Guide simulations:
    Guide variance:
     and
    For
        Intermediate proposals:
        
        
        
        
        
        Guide weights:
        Resampling:
         and
    End For
    Set
    Measurement weights:
    
End for

One might wonder why it is appropriate to keep many particle representations at intermediate timesteps while resampling down to a single representative at each observation time. Part of the answer is that adaptive simulation can fail in a limit with high frequency observations (Sec. S2). This implies that adapted simulation should not be relied upon more than necessary to ameliorate the curse of dimensionality: once proper importance sampling for filtering problem becomes tractable in a sufficiently small spatiotemporal neighborhood, one should maintain weighted particles on this spatiotemporal scale rather than resorting to adapted simulation.

Likelihood factorizations and their approximations

The approximation error for the algorithms we study here can be divided into two sources: a localization bias due to conditioning on a finite neighborhood, and Monte Carlo error. The localization bias does not disappear in the limit as Monte Carlo effort increases. It does become small as the conditioning neighborhood increases, but the Monte Carlo effort grows exponentially in the size of this neighborhood. The class of problems for which these algorithms are useful are ones where a relatively small neighborhood is adequate. Although the filtering inference is carried out using localization, the simulation of the process is carried out globally which avoids the introduction of additional boundary effects and ensures that the simulations comply with any constraint properties of the global process simulator.

In this subsection, we consider the deterministic limit of each algorithm for infinite Monte Carlo effort. We explain why the BIF, ASIF and ASIF-IR algorithms approximately target the likelihood function, subject to suitable mixing behavior. Subsequently, we consider the scaling properties as Monte Carlo effort increases. We adopt a convention that densities involving are implicitly evaluated at the data, , and densities involving are implicitly evaluated at unless otherwise specified. We write and . All the algorithms localize the sequential factorization of the likelihood,

by employing approximations that assume a neighborhood is sufficient in place of the full history . BIF approximates by

For , define . ASIF and ASIF-IR build on the following identity,

where is called the adapted transition density. The adapted process (i.e., a stochastic process following the adapted transition density) can be interpreted as a one-step greedy procedure using the data to guide the latent process. Let be the joint density of the adapted process and the proposal process,

(2)

For , define and set

using the convention that an empty density evaluates to 1. Denoting for expectation for having density , we have

Estimating this ratio by Monte Carlo sampling from is problematic due to the growing size of . Thus, ASIF and ASIF-IR make a localized approximation,

(3)

The conditional log likelihood estimate in ASIF and ASIF-IR come from replacing the expectations on the right hand side of [3] with averages over Monte Carlo replicates of simulations from the adapted process. To see that we expect the approximation in [3] to hold when dependence decays across spatiotemporal distance, we can write

where is the complement of in . As long as is approximately independent of under , this term approximately cancels in the numerator and denominator of the right hand side of [3].

BIF theory

For each pair we suppose there is a dataset and a model . We are interested in results that hold for all , but we do not require that the models for are nested in any way. The following conditions impose requirements that distant regions of space-time behave similarly and have only weak dependence. The conditions are written non-asymptotically in terms of and which are used to bound the asymptotic bias and variance in Theorem 1. Stronger bounds are obtained when the conditions hold for small , and .

Assumption A1.

There is an and a collection of neighborhoods such that, for all , , , , any bounded real-valued function , and any value of ,

Assumption A2.

For the collection of neighborhoods in Assumption A1, with , there is a constant , depending on but not on and , such that

Assumption A3.

There is a constant such that, for all and , and ,

Assumption A4.

There exists such that the following holds. For each , a set exists such that implies and

Further, there is a uniform bound .

The two mixing conditions in Assumptions A1 and A4 are subtly different. Assumption A1 is a conditional mixing property, dependent on the data, whereas A4 asserts a form of unconditional mixing. Although both capture a similar concept of weak coupling, conditional and unconditional mixing properties do not readily imply one another. Assumption A3 is a compactness condition of a type that has proved useful in the theory of particle filters despite the rarity of its holding exactly. Theorem 1 shows that these conditions let BIF compute the likelihood with a Monte Carlo variance of order with a bias of order .

Theorem 1.

Let denote the Monte Carlo likelihood approximation constructed by BIF. Consider a limit with a growing number of islands, , and suppose assumptions A1A2 and A3. There are quantities and , with bounds and , such that

(4)

where denotes convergence in distribution and

is the normal distribution with mean

and variance . If additionally Assumption A4 holds, we obtain an improved variance bound

(5)
Proof.

A complete proof is given in Sec. S3. Briefly, the assumptions imply a multivariate central limit theorem for

as . The limiting variances and covariances are uniformly bounded, using Assumptions A2 and A3. Assumption A1 provides a uniform bound on the discrepancy between and mean of the Gaussian limit. This is enough to derive [4]. Assumption A4 gives a stronger bound on covariances between sufficiently distant units, leading to [5]. ∎

ASIF-IR theory

Since ASIF is ASIF-IR with , we focus attention on ASIF-IR. At a conceptual level, the localized likelihood estimate in ASIF-IR has the same structure as its BIF counterpart. However, ASIF-IR additionally requires the capability to satisfactorily implement adapted simulation. Adapted simulation is a local calculation, making it an easier task than the global operation of filtering. Nevertheless, adapted simulation via importance sampling is vulnerable to COD for sufficiently large values of . For a continuous time model, the use of is motivated by a result that guided intermediate resampling can reduce, or even remove, the COD in the context of a particle filtering algorithm Park and Ionides (2019). Assumptions B1B4 are analogous to  A1A4 and are non-asymptotic assumptions involving , and which are required to hold uniformly over space and time. Assumption B5B7 control the Monte Carlo error arising from adapted simulation.  B5 is a stability property which ensures the Monte Carlo error cannot grow with time. Assumption B6 is a non-asymptotic bound on Monte Carlo error for a single step of adapted simulation. Assumption B7

can be guaranteed by the contruction of the algorithm, if independently generated Monte Carlo random variables are used for building the guide function and the one-step prediction particles. The asymptotic limit in Theorem 

2 arises as the number of islands increases.

Assumption B1.

There is an and a collection of neighborhoods such that the following holds for all , , , , and any bounded real-valued function : if we write , , , and ,

Assumption B2.

The bound in Assumption A2 applies for the neighborhoods defined in Assumption B1. This also implies there is a finite maximum temporal depth for the collection of neighborhoods, defined as

Assumption B3.

Identically to Assumption A3, .

Assumption B4.

Define via [2]. Suppose there is an such that the following holds. For each , a set exists such that implies and

Further, there is a uniform bound .

Assumption B5.

There is a constant such that, for any set and for all , , , and all ,

Assumption B6.

Let be a bounded function with . Let be the Monte Carlo quantity constructed in ASIF-IR, conditional on . There is a constant such that, for all and , whenever the number of particles satisfies ,

Assumption B7.

For , the Monte Carlo random variable is independent of conditional on .

Theorem 2.

Let denote the Monte Carlo likelihood approximation constructed by ASIF-IR, or by ASIF since this is the special case of ASIF-IR with . Consider a limit with a growing number of islands, , and suppose assumptions B1B2, B3, B5, B6 and B7. Suppose the number of particles exceeds the requirement for B6. There are quantities and with and such that

If additionally Assumption B4 holds, we obtain an improved rate of

Proof.

A full proof is provided in Sec. S4. The extra work to prove Theorem 2 beyond the argument for Theorem 1 is to bound the error arising from the importance sampling approximation to a draw from the adapted transition density. This bound is constructed using Assumptions B5, B6 and B7. The scaling of the constant with , and in Assumption B6 has been studied by Park and Ionides (2019), where it was established that setting can lead to being constant or slowly growing with . The remainder of the proof follows the same approach as Theorem 1, with the adapted process replacing the unconditional latent process. ∎

The theoretical results foreshadow our empirical observations that the relative performance of BIF, ASIF and ASIF-IR is situation-dependent. Assumption A4 is a mixing assumption for the unconditional latent process, whereas Assumption B4 replaces this with a mixing assumption for the adapted process conditional on the data. For a non-stationary process, Assumption A4 may fail to hold uniformly in whereas the adapted process may provide stable tracking of the latent process (Sec. S2). When Assumption A4 is satisfactory, BIF can benefit from not requiring Assumptions B5, B6 and B7. Adapted simulation is an easier problem than filtering, but nevertheless can become difficult in high dimensions, with the consequence that Assumption B6 could require large . The tradeoff between ASIF and ASIF-IR depends on the effectiveness of the guide function for the problem at hand. Intermediate resampling and guide function calculation require additional computational resources, which will necessitate smaller values of and . In some situations, the improved scaling properties of ASIF-IR compared to ASIF, corresponding to a lower value of , will outweigh this cost.

Examples

Likelihood evaluation via filtering does not by itself enable parameter estimation for POMP models, however it provides a foundation for Bayesian Andrieu et al. (2010) and likelihood-based Ionides et al. (2015)

inference. We therefore investigate likelihood evaluation with the expectation that improved methods for likelihood evaluation will lead to improved statistical methodology. Monte Carlo methods for computing the log likelihood generally suffer from both bias and variance, both of which can be considerable for large datasets and complex models. Appropriate inference methodology, such as Monte Carlo adjusted profile (MCAP) confidence intervals, can accommodate substantial Monte Carlo variance so long as the bias is slowly varying across the statistically plausible region of the parameter space

Ionides et al. (2017).

We first compare the performance of the three island filters (BIF, ASIF and ASIF-IR) and alternative approaches, on a spatiotemporal Gaussian process for which the exact likelihood is available via a Kalman filter. We see in Fig. 

1 that ASIF-IR can have a considerable advantages over BIF and ASIF for problems with an intermediate level of coupling. We then develop a model for measles transmission within and between cities. The model is weakly coupled, leading to successful performance for all three island filters. This class of metapopulation models was the primary motivation for the development of these methodologies. However, for a highly coupled system of moderate dimensions, a global particle filter such as GIRF may outperform our island methods; an example is given in Sec. S9.

Correlated Brownian motion

Suppose where comprises independent standard Brownian motions, and with being the circle distance,

Set for with initial value and suppose measurement errors are independent and normally distributed, with . The parameter determines the strength of the spatial coupling.

Figure 1: log likelihood estimates for a correlated Brownian motion model of various dimensions. BIF, ASIF and ASIF-IR are compared with a guided intermediate resampling filter (GIRF) and basic particle filter (PF). The exact likelihood was computed via a Kalman filter (KF).

Fig. 1 shows how the Monte Carlo error of the island filters compares to a standard particle filter (PF) and a guided intermediate resampling filter (GIRF) Park and Ionides (2019). For our numerical results, we use , and . The algorithmic parameters and run times are listed in Sec. S5, together with a plot of the simulated data and supplementary discussion. The GIRF framework encompasses lookahead particle filter techniques, such as the auxiliary particle filter Pitt and Shepard (1999), and intermediate resampling techniques Del Moral et al. (2017). GIRF methods combining these techiques were found to perform better than either of these component techniques alone Park and Ionides (2019). Thus, GIRF here represents a state-of-the-art auxiliary particle filter that targets the complete joint filter density for all units. We use the general-purpose, plug-and-play implementation of GIRF provided by the spatPomp R package (Asfaw et al.; 2019); this implementation of GIRF is thus not tailored in any way to exploit special features of this toy model. The true log likelihood can be computed analytically via the Kalman Filter (KF) in this simple Gaussian case. PF works well for small values of in Fig. 1, but rapidly starts struggling as increases. GIRF behaves comparably to PF for small but its performance is maintained for larger . ASIF and ASIF-IR have a small efficiency loss, for small , relative to PF due to the localization involved in the filter weighting, but for large this cost is more than paid back by the benefit of the reduced Monte Carlo variability. BIF has a larger efficiency loss for small , but its favorable scaling properties lead it to overtake all but ASIF-IR for larger . This linear Gaussian SpatPOMP model provides a simple scenario to demonstrate scaling behavior. We now move on to a more complex SpatPOMP exemplifying the nonlinear models motivating our new filtering approach.

Spatiotemporal measles epidemics

Data analysis for spatiotemporal systems featuring nonlinear, nonstationary mechanisms and partial observability has been a longstanding open challenge for ecological and epidemiological analysis Bjørnstad and Grenfell (2001). A compartment modeling framework for spatiotemporal population dynamics divides the population at each spatial location into categories (called compartments) which are modeled as homogeneous. Spatiotemporal compartment models can be called patch models or metapopulation models in an ecological context. Particle filter methods based on targeted proposal distributions have been developed for spatiotemporal applications, but such methods suffer from the curse of dimensionality Snyder et al. (2015). The ensemble Kalman filter provides a state-of-the-art approach to high-dimensional spatiotemporal filtering, but the approximations inherent in the EnKF can be problematic for models that are highly nonlinear or non-Gaussian Ades and Van Leeuwen (2015). Our island filter methodologies have theoretical guarantees for arbitrarily nonlinear and non-Gaussian models, while having improved scaling properties compared to particle filters.

We consider a spatiotemporal model for disease transmission dynamics of measles within and between multiple cities, based on the model of Park and Ionides (2019) which adds spatial interaction to the compartment model presented by He et al. (2010). The model compartmentalizes the population of each city into susceptible (), exposed (), infectious (), and recovered/removed () categories. The number of individuals in each compartment city at time are denoted by , , , and . The population dynamics are written in terms of non-decreasing processes counting the cumulative transitions in city , up to time , between compartments identified by the subscripts. Our model is described by the following system of stochastic differential equations, for ,

Here, models recruitment into the susceptible population, and models emigration and death. When is integer-valued, the system has step function solutions. The total population is calculated by smoothing census data and is treated as known. The number of recovered individuals in city is therefore defined implicitly. is modeled as negative binomial death processes Bretó et al. (2009); Bretó and Ionides (2011) with over-dispersion parameter , and rate given by

(6)

where models seasonality driven by high contact rates between children at school, described by

with being the proportion of the year taken up by the school terms, is the mean transmission rate, and measures the reduction of transmission during school holidays. In [6], is a mixing exponent modeling inhomogeneous contact rates within a city, and models immigration of infected individuals which is appropriate when analyzing a subset of cities that cannot be treated as a closed system. The number of travelers from city to is denoted by . Here, is constructed using the gravity model of Xia et al. (2004),

where denotes the distance between city and city , is the average population for city across time, is the average population across cities, and is the average distance between a randomly chosen pair of cities. Here, we model as fixed through time and symmetric between any two arbitrary cities, though a natural extension would allow for temporal variation and asymmetric movement between two cities. The transition processes , and are modeled as conditional Poisson processes with per-capita rates , and respectively, and we fix . The birth process is an inhomogeneous Poisson processes with rate

, given by interpolated census data.

To complete the model specification, we must describe the measurement process. Let be the number of removed infected individuals in the th reporting interval. Suppose that cases are quarantined once they are identified, so that reported cases comprise a fraction of these removal events. The case report is modeled as a realization of a discretized conditionally Gaussian random variable , defined for via

(7)

where is the cumulative distribution function, and

models overdispersion relative to the binomial distribution. For

, we replace by in [7].

Figure 2: log likelihood estimates for simulated data from the measles model of various dimensions. BIF, ASIF and ASIF-IR are compared with a guided intermediate resampling filter (GIRF) and a basic particle filter (PF).

This model includes many features that have been proposed to be relevant for understanding measles transmission dynamics He et al. (2010). Our plug-and-play methodology permits consideration of all these features, and readily extends to the investigation of further variations. Likelihood-based inference via plug-and-play methodology therefore provides a framework for evaluating which features of a dynamical model are critical for explaining the data King et al. (2008). By contrast, Xia et al. (2004) developed a linearization for a specific spatiotemporal measles model which is numerically convenient but not readily adaptable to assess alternative model choices.

We first assess the scaling properties of the filters on the measles model by evaluating the likelihood over varying numbers of units, , for fixed parameters. The results are given in Fig. 2, with additional information about timing, algorithmic choices, parameter values and a plot of the data provided in Sec. S6. In Fig. 2, the log likelihood per unit per time increases with because city size decreases with

. Smaller cities have fewer measles cases, resulting in a narrower and taller probability density function. Fig. 

2 shows all three island filters performing well, compared to a rapid decline in performance of the particle filter (PF) starting at , and a slower decline for GIRF. The bias in the island filters is small in this example, as evidenced by their close match to PF for small numbers of units. This is likely because the coupling in this model is weak: a high proportion of measles transmission occurs within cities rather than between cities. The neighborhood used here for city at time was two previous lags, . This choice of neighborhood is explored in Sec. S7. This local neighborhood still involves other cities in forecast for city , since the forecast on each island involves a full coupled model.

Figure 3: Likelihood slice varying the coupling parameter, computed via ASIF with cities. The solid lines construct a 95% Monte Carlo adjusted confidence interval Ionides et al. (2017). For this model realization, this interval includes the true parameter value identified by a dashed line.

Fig. 3 evaluates the filter methods on the task of computing a slice of the likelihood function over the coupling parameter for simulated data. This slice varies while fixing the other parameters at the values used for the simulation. Scientifically, this gives us an upper bound on the identifiability of from such data, since the likelihood slice provides statistically efficient inference when all other parameters are known. In this case, the Monte Carlo error is comparable to the change in the log likelihood over a wide range of values of . Therefore, Monte Carlo error has substantial impact on inference even given considerable computational resources: Fig. 3 took 7 days to compute on 40 cores. Most transmission of measles is local, so evidently there is limited evidence available about strength of transmission between cities even from 40 jointly observed epidemic time series. By contrast, parameters of within-city transmission that are shared between cities should be well identified by carrying out inference on a collection of cities. As an example of this, Sec. S8 presents a likelihood slice for the recovery rate, .

Discussion

The pseudocode presented for the island filters describes how the outputs are calculated given the inputs, but does not prescribe details of how these quantities are calculated. There is scope for implementations to trade off memory, computation and communication by making differing decisions on how the various loops defined in the pseudocode are coded, such as decisions on memory over-writing and parallelization. This article focuses on the logical structure of the algorithms, leaving room for future research on implementation-specific considerations, though some supplementary discussion of memory-efficient implementation is given in Sec. S10.

This paper has focused on likelihood evaluation, a critical quantity for testing scientific hypotheses about model structure and the values of unknown parameters. However, our island filters also provide localized solutions to the filtering problem, using the local weights constructed for the adapted simulations constructed on each island. A solution to the filtering problem, in turn, provides a foundation for forecasting.

Plug-and-play inference algorithms based on sequential Monte Carlo likelihood evaluation has proved successful for investigating highly nonlinear partially observed dynamic systems of low dimension arising in analysis of epidemiological and ecological time series data Bretó (2018); Pons-Salort and Grassly (2018); de Cellès et al. (2018); Marino et al. (2018). A benefit of the plug-and-play property is that it facilitates development of broadly applicable software, which in turn promotes creative scientific model development Bretó et al. (2009); He et al. (2010).

Geophysical data assimilation has similarities to inference on spatiotemporal biological systems. Relative to biological systems, geophysical applications are characterized by a greater number of spatial locations, better mathematical understanding of the underlying processes, and lower stochasticity. The locally weighted particle filter of Poterjoy (2016)

shares some features with our approach, but it builds on a moment approximation designed for geophysical models.

The algorithms BIF, ASIF, ASIF-IR and GIRF used for this article all enjoy the plug-and-play property. The numerical results for this paper use the asif, asifir and girf

functions via the open-source R package

spatPomp Asfaw et al. (2019) that provides a spatiotemporal extension of the R package pomp King et al. (2016). BIF was implemented using asif with particles per island. The pfilter and enkf functions were used from pomp. The source code for this paper will be contributed to an open-source scientific archive upon acceptance for publication.

Acknowledgements: This work was supported by National Science Foundation grant DMS-1761603 and National Institutes of Health grants 1-U54-GM111274 and 1-U01-GM110712.

References

  • (1)
  • Ades and Van Leeuwen (2015) Ades, M. and Van Leeuwen, P. J. (2015). The equivalent-weights particle filter in a high-dimensional system, Quarterly Journal of the Royal Meteorological Society 141(687): 484–503.
  • Andrieu et al. (2010) Andrieu, C., Doucet, A. and Holenstein, R. (2010).

    Particle Markov chain Monte Carlo methods,

    Journal of the Royal Statistical Society, Series B (Statistical Methodology) 72: 269–342.
  • Asfaw et al. (2019) Asfaw, K., Ionides, E. L. and King, A. A. (2019). spatPomp: R package for statistical inference for spatiotemporal partially observed Markov processes, https://github.com/kidusasfaw/spatPomp .
  • Bengtsson et al. (2008) Bengtsson, T., Bickel, P. and Li, B. (2008). Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems, in T. Speed and D. Nolan (eds), Probability and Statistics: Essays in Honor of David A. Freedman, Institute of Mathematical Statistics, Beachwood, OH, pp. 316–334.
  • Bjørnstad and Grenfell (2001) Bjørnstad, O. N. and Grenfell, B. T. (2001). Noisy clockwork: Time series analysis of population fluctuations in animals, Science 293: 638–643.
  • Bretó (2018) Bretó, C. (2018). Modeling and inference for infectious disease dynamics: A likelihood-based approach, Statistical Science 33(1): 57–69.
  • Bretó et al. (2009) Bretó, C., He, D., Ionides, E. L. and King, A. A. (2009). Time series analysis via mechanistic models, Annals of Applied Statistics 3: 319–348.
  • Bretó and Ionides (2011) Bretó, C. and Ionides, E. L. (2011). Compound Markov counting processes and their applications to modeling infinitesimally over-dispersed systems, Stochastic Processes and their Applications 121: 2571–2591.
  • Crisan and Rozovskii (2011) Crisan, D. and Rozovskii, B. (2011).

    The Oxford handbook of nonlinear filtering

    , Oxford University Press.
  • de Cellès et al. (2018) de Cellès, M. D., Magpantay, F. M., King, A. A. and Rohani, P. (2018). The impact of past vaccination coverage and immunity on pertussis resurgence, Science Translational Medicine 10(434): eaaj1748.
  • Del Moral et al. (2017) Del Moral, P., Moulines, E., Olsson, J. and Vergé, C. (2017). Convergence properties of weighted particle islands with application to the double bootstrap algorithm, Stochastic Systems 6(2): 367–419.
  • Del Moral and Murray (2015) Del Moral, P. and Murray, L. M. (2015). Sequential Monte Carlo with highly informative observations, Journal on Uncertainty Quantification 3: 969–997.
  • Doucet et al. (2001) Doucet, A., de Freitas, N. and Gordon, N. J. (2001). Sequential Monte Carlo Methods in Practice, Springer, New York.
  • Doucet and Johansen (2011) Doucet, A. and Johansen, A. (2011). A tutorial on particle filtering and smoothing: Fifteen years later, in D. Crisan and B. Rozovsky (eds), Oxford Handbook of Nonlinear Filtering, Oxford University Press.
  • Ebert (2001) Ebert, E. E. (2001). Ability of a poor man’s ensemble to predict the probability and distribution of precipitation, Monthly Weather Review 129(10): 2461–2480.
  • He et al. (2010) He, D., Ionides, E. L. and King, A. A. (2010). Plug-and-play inference for disease dynamics: Measles in large and small towns as a case study, Journal of the Royal Society Interface 7: 271–283.
  • Ionides et al. (2017) Ionides, E. L., Breto, C., Park, J., Smith, R. A. and King, A. A. (2017). Monte Carlo profile confidence intervals for dynamic systems, Journal of the Royal Society Interface 14: 1–10.
  • Ionides et al. (2015) Ionides, E. L., Nguyen, D., Atchadé, Y., Stoev, S. and King, A. A. (2015). Inference for dynamic and latent variable models via iterated, perturbed Bayes maps, Proceedings of the National Academy of Sciences of the USA 112(3): 719––724.
  • King et al. (2008) King, A. A., Ionides, E. L., Pascual, M. and Bouma, M. J. (2008). Inapparent infections and cholera dynamics, Nature 454: 877–880.
  • King et al. (2016) King, A. A., Nguyen, D. and Ionides, E. L. (2016). Statistical inference for partially observed Markov processes via the R package pomp, Journal of Statistical Software 69: 1–43.
  • Leutbecher and Palmer (2008) Leutbecher, M. and Palmer, T. N. (2008). Ensemble forecasting, Journal of Computational Physics 227(7): 3515–3539.
  • Marino et al. (2018) Marino, J. A., Peacor, S. D., Bunnell, D. B., Vanderploeg, H. A., Pothoven, S. A., Elgin, A. K., Bence, J. R., Jiao, J. and Ionides, E. L. (2018). Evaluating consumptive and nonconsumptive predator effects on prey density using field times series data, Ecology .
  • Palmer (2002) Palmer, T. N. (2002). The economic value of ensemble forecasts as a tool for risk assessment: From days to decades, Quarterly Journal of the Royal Meteorological Society 128(581): 747–774.
  • Park and Ionides (2019) Park, J. and Ionides, E. L. (2019). Inference on high-dimensional implicit dynamic models using a guided intermediate resampling filter, Arxiv:1708.08543v3 .
  • Pitt and Shepard (1999) Pitt, M. K. and Shepard, N. (1999). Filtering via simulation: Auxillary particle filters, Journal of the American Statistical Association 94: 590–599.
  • Pons-Salort and Grassly (2018) Pons-Salort, M. and Grassly, N. C. (2018). Serotype-specific immunity explains the incidence of diseases caused by human enteroviruses, Science 361(6404): 800–803.
  • Poterjoy (2016) Poterjoy, J. (2016). A localized particle filter for high-dimensional nonlinear systems, Monthly Weather Review 144(1): 59–76.
  • Rebeschini and van Handel (2015) Rebeschini, P. and van Handel, R. (2015). Can local particle filters beat the curse of dimensionality?, The Annals of Applied Probability 25(5): 2809–2866.
  • Snyder et al. (2015) Snyder, C., Bengtsson, T. and Morzfeld, M. (2015). Performance bounds for particle filters using the optimal proposal, Monthly Weather Review 143(11): 4750–4761.
  • Vergé et al. (2013) Vergé, C., Dubarry, C., Del Moral, P. and Moulines, E. (2013). On parallel implementation of sequential Monte Carlo methods: The island particle model, Statistics and Computing 25(2): 243–260.
  • Xia et al. (2004) Xia, Y., Bjørnstad, O. N. and Grenfell, B. T. (2004). Measles metapopulation dynamics: A gravity model for epidemiological coupling and dynamics, American Naturalist 164(2): 267–281.