Introduction
Particle filter (PF) algorithms are a fundamental tool for inference on nonlinear partially observed dynamic systems (Doucet et al.; 2001; Doucet and Johansen; 2011; Bretó et al.; 2009; Ionides et al.; 2015). Standard PF methods suffer from a curse of dimensionality (COD), defined as an exponential increase in computational demands as the problem size grows Bengtsson et al. (2008); Snyder et al. (2015); Rebeschini and van Handel (2015). This limits the applicability of PFbased statistical methodology for highdimensional systems which arise in spatiotemporal data analysis and other situations. In PF algorithms, particles are representations of latent variables. Groups of particles which have limited interaction within the algorithm are called islands. We say an independent island particle filter is a PF algorithm that constructs groups of noninteracting particles and then combines the results. The independence is understood in a Monte Carlo sense, with each island having access to the entire dataset. This geographic island metaphor describing the Monte Carlo computation is separate from any geographic structure that may, or may not, be present in a spatiotemporal model. A previous motivation for dividing particles into weakly interacting islands has been to minimize communication for parallelized PF algorithms (Vergé et al.; 2013; Del Moral et al.; 2017). We have a different motivation: to show that many independent PF calculations, each one of which is too weak to be useful by itself, can be combined to solve highdimensional problems not amenable to solution by a single filter. Indeed, the calculation carried out on each island will not even attempt to directly solve the filtering problem. The task of each island is to provide a representation of the latent process that provides some relevant information at some points in space and time. The algorithms then proceed to combine these islands, assessing which islands are good representations at a given spatiotemporal location.
Independent island filters may be motivated by the empirical success of poor man’s ensemble forecasting methodology in which a collection of independently constructed forecasts is generated using different models and methods (Ebert; 2001). Poor man’s ensembles have sometimes been found to have greater forecasting skill than any one forecast (Leutbecher and Palmer; 2008; Palmer; 2002). One explanation for this phenomenon is that even a hypothetically perfect model cannot filter and forecast well using methodology afflicted by the COD. We show that independent island methodology can relieve this limitation. From this perspective, the independence of the forecasts in the poor man’s ensemble, rather than the diversity of model structures, may be the key to its success.
Our idea is most clearly seen in a relatively simple algorithm which we call the basic island filter (BIF). Each island in BIF simply simulates a realization of the latent process model. While formally beating the COD under a weak mixing assumption, BIF could have poor numerical behavior if a very large number of islands were needed to reach this happy asymptotic limit. Our empirical results show that BIF may indeed be a useful algorithm in some situations, however it also provides a route toward two other algorithms which we call the adapted sampling island filter (ASIF) and adapted sampling island filter with intermediate resampling (ASIFIR). ASIF and ASIFIR carry out some resampling on each island and therefore enable each island to track the data, in a weak sense, while resisting the COD.
Notation
To set up notation, we suppose the latent process is comprised of a collection of units, labeled which we write as . An unobserved, latent Markov process at a discrete sequence of observation times is written as , with taking values in . The initial value may be stochastic or deterministic. Observations are made on each unit, modeled by an observable process which takes values in . Observations are modeled as being conditionally independent given the latent process. The conditional independence of measurements applies over both space and time, so is conditionally independent of given whenever . This unit structure for the observation process is not necessary for all that follows (see Sec. S1). We suppose the existence of a joint density of and with respect to some appropriate measure. We use the same subscript notation to denote other marginal and conditional densities. The data are for spatial unit at time . For a fixed value of , this model fits into the partially observed Markov process (POMP) framework (Bretó et al.; 2009)
, also known as a state space model or hidden Markov model. Spatiotemporal POMP models are characterized by a high dimensionality, proportional to
, for the latent Markov process. We use the term SpatPOMP to describe POMP models indexed by as well as . In accordance with the spatial interpretation of the units , we may suppose that units sufficiently distant in some sense approach independence, a phenomenon called weak coupling. Inference techniques for SpatPOMPs may be relevant to highdimensional stochastic dynamic systems where does not index spatial locations. The units will sometimes be considered in sequence within each time point, and so we write the set of points preceding by(1) 
An ordering of the spatial locations, required by [1], might seem artificial. Indeed, densities such as could potentially be hard to compute or simulate from. The algorithms we study do not carry out such computations, but only evaluate the measurement model on these neighborhoods. To represent weak coupling, we suppose there is a neighborhood such that the latent process on is approximately conditionally independent of given data on .
Island filter methodology
The inputs for the BIF, ASIF and ASIFIR are listed in Table 1. All three algorithms are plugandplay
, meaning that they require as input a simulator for the latent dynamic process but not an evaluator of transition probabilities
(Bretó et al.; 2009; He et al.; 2010). The primary output under consideration is an estimate of the log likelihood for the data given the model. The log likelihoodis a fundamental quantity for both Bayesian and nonBayesian statistical inference. Monte Carlo estimates of the log likelihood underpin modern methods for fullinformation statisical inference on partially observed dynamic systems
(Andrieu et al.; 2010; Ionides et al.; 2015, 2017). Filtering theory ties together the three tasks of likelihood evaluation, forecasting, and inferring latent dynamic variables Crisan and Rozovskii (2011). Successful likelihood evaluation is therefore closely related to success at forecasting and latent process reconstruction.The pseudocodes for the algorithms that follow will adopt a convention that implicit loops are carried out over all free indices. For example, the construction of in BIF has an implicit loop over , and . The construction of in ASIF has an implit loop over , and but not over since is already defined in an explicit loop and is therefore not a free index. Table 1 lists the ranges of the implicit loops for all following algorithms.
Table 1. Island filter inputs, outputs and implicit loops. 
input: 
simulator for and 
evaluator for 
number of islands, 
neighborhood structure, 
data, 
ASIF and ASIFIR: particles per island, 
ASIFIR: number of intermediate timesteps, 
ASIFIR: measurement variance parameterizations, and 
ASIFIR: approximate process and observation mean functions, and 
output: 
Log likelihood estimate, 
implicit loops: 
, , , 
The BIF algorithm that follows amounts to an island filter with a single particle per island, with no weighting or resampling within each island.
BIF. Basic island filter. 

Simulate 
Measurement weights, 
Prediction weights, 
Theorem 1 shows that BIF can beat COD. However, an algorithm such as BIF, which doesn’t carry out resampling, would require quick spatiotemporal mixing for the simulated particles to remain relevant for predictions throughout a long time series. It may in practice be necessary to select simulations consistent with the data, much as standard PF algorithms do. We look for approaches that build on the basic insight of BIF while having superior practical performance.
Whereas the full global filtering problem may be intractable via importance sampling methods, a version of this problem local in space and time may nevertheless be feasible. Drawing a sample from the conditional density, , is distinct from the filtering problem: we call this operation adapted simulation. For models where is close to , adapted simulation requires a local calculation that may be much easier than the full filtering calculation. The following adapted simulation island filter (ASIF) is constructed under a hypothesis that the adapted simulation problem is tractable. The adapted simulations are then reweighted in a neighborhood of each point in space and time to construct a local approximation to the filtering problem which leads to an estimate of the likelihood.
ASIF. Adapted simulation island filter. 
Initialize adapted simulation: 
For 
Proposals: 
Measurement weights: 
Adapted resampling weights: 
Resampling: 
End for 
In addition, we consider an algorithm which adds intermediate resampling to the adapted simulation islands, and is therefore termed ASIFIR. Intermediate resampling involves assessing the satisfactory progress of particles toward the subsequent observation at a collection of times between observations. This is well defined when the latent system is a continuous time process with obsevation times such that . For a continuous time latent process model, intermediate resampling can have favorable scaling properties when the intermediate times
scale with Park and Ionides (2019). In the case , ASIFIR reduces to ASIF. Intermediate resampling was developed in the context of sequential Monte Carlo Del Moral and Murray (2015); Park and Ionides (2019); however, the same theory and methodology can be applied to the simpler and easier problem of adapted simulation. ASIFIR requires a guide function which assess the capability of each particle to explain future data. The implementation in this pseudocode constructs the guide used by the guided intermediate resampling filter Park and Ionides (2019) which generalizes the popular auxiliary particle filter (Pitt and Shepard; 1999). The guide function affects numerical performance of the algorithm but not its correctness: it enables a computationally convenient approximation to improve performance on the intractable target problem. Our guide function supposes the availability of a deterministic function approximating evolution of the mean of the latent process, and uses simulations to assess variability around this mean. We write this as
Further, the guide requires that the measurement model has known conditional mean and variance as a function of the model parameter vector
, written asAlso required for ASIFIR is an inverse function such that
This guide function is applicable to spatiotemporal versions of a broad range of population and compartment models used to model dynamic systems in ecology, epidemiology, and elsewhere. Other guide functions could be developed and inserted into the ASIFIR algorithm. The number of particles for the guide simulations in ASIFIR does not necessarily have to be set equal to the number of intermediate proposals. We make that simplification here since there is no compelling reason against it.
ASIFIR. Adapted simulation island filter with 
intermediate resampling. 
Initialize adapted simulation: 
For 
Guide simulations: 
Guide variance: 
and 
For 
Intermediate proposals: 
Guide weights: 
Resampling: 
and 
End For 
Set 
Measurement weights: 
End for 
One might wonder why it is appropriate to keep many particle representations at intermediate timesteps while resampling down to a single representative at each observation time. Part of the answer is that adaptive simulation can fail in a limit with high frequency observations (Sec. S2). This implies that adapted simulation should not be relied upon more than necessary to ameliorate the curse of dimensionality: once proper importance sampling for filtering problem becomes tractable in a sufficiently small spatiotemporal neighborhood, one should maintain weighted particles on this spatiotemporal scale rather than resorting to adapted simulation.
Likelihood factorizations and their approximations
The approximation error for the algorithms we study here can be divided into two sources: a localization bias due to conditioning on a finite neighborhood, and Monte Carlo error. The localization bias does not disappear in the limit as Monte Carlo effort increases. It does become small as the conditioning neighborhood increases, but the Monte Carlo effort grows exponentially in the size of this neighborhood. The class of problems for which these algorithms are useful are ones where a relatively small neighborhood is adequate. Although the filtering inference is carried out using localization, the simulation of the process is carried out globally which avoids the introduction of additional boundary effects and ensures that the simulations comply with any constraint properties of the global process simulator.
In this subsection, we consider the deterministic limit of each algorithm for infinite Monte Carlo effort. We explain why the BIF, ASIF and ASIFIR algorithms approximately target the likelihood function, subject to suitable mixing behavior. Subsequently, we consider the scaling properties as Monte Carlo effort increases. We adopt a convention that densities involving are implicitly evaluated at the data, , and densities involving are implicitly evaluated at unless otherwise specified. We write and . All the algorithms localize the sequential factorization of the likelihood,
by employing approximations that assume a neighborhood is sufficient in place of the full history . BIF approximates by
For , define . ASIF and ASIFIR build on the following identity,
where is called the adapted transition density. The adapted process (i.e., a stochastic process following the adapted transition density) can be interpreted as a onestep greedy procedure using the data to guide the latent process. Let be the joint density of the adapted process and the proposal process,
(2)  
For , define and set
using the convention that an empty density evaluates to 1. Denoting for expectation for having density , we have
Estimating this ratio by Monte Carlo sampling from is problematic due to the growing size of . Thus, ASIF and ASIFIR make a localized approximation,
(3) 
The conditional log likelihood estimate in ASIF and ASIFIR come from replacing the expectations on the right hand side of [3] with averages over Monte Carlo replicates of simulations from the adapted process. To see that we expect the approximation in [3] to hold when dependence decays across spatiotemporal distance, we can write
where is the complement of in . As long as is approximately independent of under , this term approximately cancels in the numerator and denominator of the right hand side of [3].
BIF theory
For each pair we suppose there is a dataset and a model . We are interested in results that hold for all , but we do not require that the models for are nested in any way. The following conditions impose requirements that distant regions of spacetime behave similarly and have only weak dependence. The conditions are written nonasymptotically in terms of and which are used to bound the asymptotic bias and variance in Theorem 1. Stronger bounds are obtained when the conditions hold for small , and .
Assumption A1.
There is an and a collection of neighborhoods such that, for all , , , , any bounded realvalued function , and any value of ,
Assumption A2.
For the collection of neighborhoods in Assumption A1, with , there is a constant , depending on but not on and , such that
Assumption A3.
There is a constant such that, for all and , and ,
Assumption A4.
There exists such that the following holds. For each , a set exists such that implies and
Further, there is a uniform bound .
The two mixing conditions in Assumptions A1 and A4 are subtly different. Assumption A1 is a conditional mixing property, dependent on the data, whereas A4 asserts a form of unconditional mixing. Although both capture a similar concept of weak coupling, conditional and unconditional mixing properties do not readily imply one another. Assumption A3 is a compactness condition of a type that has proved useful in the theory of particle filters despite the rarity of its holding exactly. Theorem 1 shows that these conditions let BIF compute the likelihood with a Monte Carlo variance of order with a bias of order .
Theorem 1.
Let denote the Monte Carlo likelihood approximation constructed by BIF. Consider a limit with a growing number of islands, , and suppose assumptions A1, A2 and A3. There are quantities and , with bounds and , such that
(4) 
where denotes convergence in distribution and
is the normal distribution with mean
and variance . If additionally Assumption A4 holds, we obtain an improved variance bound(5) 
Proof.
A complete proof is given in Sec. S3. Briefly, the assumptions imply a multivariate central limit theorem for
as . The limiting variances and covariances are uniformly bounded, using Assumptions A2 and A3. Assumption A1 provides a uniform bound on the discrepancy between and mean of the Gaussian limit. This is enough to derive [4]. Assumption A4 gives a stronger bound on covariances between sufficiently distant units, leading to [5]. ∎ASIFIR theory
Since ASIF is ASIFIR with , we focus attention on ASIFIR. At a conceptual level, the localized likelihood estimate in ASIFIR has the same structure as its BIF counterpart. However, ASIFIR additionally requires the capability to satisfactorily implement adapted simulation. Adapted simulation is a local calculation, making it an easier task than the global operation of filtering. Nevertheless, adapted simulation via importance sampling is vulnerable to COD for sufficiently large values of . For a continuous time model, the use of is motivated by a result that guided intermediate resampling can reduce, or even remove, the COD in the context of a particle filtering algorithm Park and Ionides (2019). Assumptions B1–B4 are analogous to A1–A4 and are nonasymptotic assumptions involving , and which are required to hold uniformly over space and time. Assumption B5–B7 control the Monte Carlo error arising from adapted simulation. B5 is a stability property which ensures the Monte Carlo error cannot grow with time. Assumption B6 is a nonasymptotic bound on Monte Carlo error for a single step of adapted simulation. Assumption B7
can be guaranteed by the contruction of the algorithm, if independently generated Monte Carlo random variables are used for building the guide function and the onestep prediction particles. The asymptotic limit in Theorem
2 arises as the number of islands increases.Assumption B1.
There is an and a collection of neighborhoods such that the following holds for all , , , , and any bounded realvalued function : if we write , , , and ,
Assumption B2.
Assumption B3.
Identically to Assumption A3, .
Assumption B4.
Define via [2]. Suppose there is an such that the following holds. For each , a set exists such that implies and
Further, there is a uniform bound .
Assumption B5.
There is a constant such that, for any set and for all , , , and all ,
Assumption B6.
Let be a bounded function with . Let be the Monte Carlo quantity constructed in ASIFIR, conditional on . There is a constant such that, for all and , whenever the number of particles satisfies ,
Assumption B7.
For , the Monte Carlo random variable is independent of conditional on .
Theorem 2.
Let denote the Monte Carlo likelihood approximation constructed by ASIFIR, or by ASIF since this is the special case of ASIFIR with . Consider a limit with a growing number of islands, , and suppose assumptions B1, B2, B3, B5, B6 and B7. Suppose the number of particles exceeds the requirement for B6. There are quantities and with and such that
If additionally Assumption B4 holds, we obtain an improved rate of
Proof.
A full proof is provided in Sec. S4. The extra work to prove Theorem 2 beyond the argument for Theorem 1 is to bound the error arising from the importance sampling approximation to a draw from the adapted transition density. This bound is constructed using Assumptions B5, B6 and B7. The scaling of the constant with , and in Assumption B6 has been studied by Park and Ionides (2019), where it was established that setting can lead to being constant or slowly growing with . The remainder of the proof follows the same approach as Theorem 1, with the adapted process replacing the unconditional latent process. ∎
The theoretical results foreshadow our empirical observations that the relative performance of BIF, ASIF and ASIFIR is situationdependent. Assumption A4 is a mixing assumption for the unconditional latent process, whereas Assumption B4 replaces this with a mixing assumption for the adapted process conditional on the data. For a nonstationary process, Assumption A4 may fail to hold uniformly in whereas the adapted process may provide stable tracking of the latent process (Sec. S2). When Assumption A4 is satisfactory, BIF can benefit from not requiring Assumptions B5, B6 and B7. Adapted simulation is an easier problem than filtering, but nevertheless can become difficult in high dimensions, with the consequence that Assumption B6 could require large . The tradeoff between ASIF and ASIFIR depends on the effectiveness of the guide function for the problem at hand. Intermediate resampling and guide function calculation require additional computational resources, which will necessitate smaller values of and . In some situations, the improved scaling properties of ASIFIR compared to ASIF, corresponding to a lower value of , will outweigh this cost.
Examples
Likelihood evaluation via filtering does not by itself enable parameter estimation for POMP models, however it provides a foundation for Bayesian Andrieu et al. (2010) and likelihoodbased Ionides et al. (2015)
inference. We therefore investigate likelihood evaluation with the expectation that improved methods for likelihood evaluation will lead to improved statistical methodology. Monte Carlo methods for computing the log likelihood generally suffer from both bias and variance, both of which can be considerable for large datasets and complex models. Appropriate inference methodology, such as Monte Carlo adjusted profile (MCAP) confidence intervals, can accommodate substantial Monte Carlo variance so long as the bias is slowly varying across the statistically plausible region of the parameter space
Ionides et al. (2017).We first compare the performance of the three island filters (BIF, ASIF and ASIFIR) and alternative approaches, on a spatiotemporal Gaussian process for which the exact likelihood is available via a Kalman filter. We see in Fig.
1 that ASIFIR can have a considerable advantages over BIF and ASIF for problems with an intermediate level of coupling. We then develop a model for measles transmission within and between cities. The model is weakly coupled, leading to successful performance for all three island filters. This class of metapopulation models was the primary motivation for the development of these methodologies. However, for a highly coupled system of moderate dimensions, a global particle filter such as GIRF may outperform our island methods; an example is given in Sec. S9.Correlated Brownian motion
Suppose where comprises independent standard Brownian motions, and with being the circle distance,
Set for with initial value and suppose measurement errors are independent and normally distributed, with . The parameter determines the strength of the spatial coupling.
Fig. 1 shows how the Monte Carlo error of the island filters compares to a standard particle filter (PF) and a guided intermediate resampling filter (GIRF) Park and Ionides (2019). For our numerical results, we use , and . The algorithmic parameters and run times are listed in Sec. S5, together with a plot of the simulated data and supplementary discussion. The GIRF framework encompasses lookahead particle filter techniques, such as the auxiliary particle filter Pitt and Shepard (1999), and intermediate resampling techniques Del Moral et al. (2017). GIRF methods combining these techiques were found to perform better than either of these component techniques alone Park and Ionides (2019). Thus, GIRF here represents a stateoftheart auxiliary particle filter that targets the complete joint filter density for all units. We use the generalpurpose, plugandplay implementation of GIRF provided by the spatPomp R package (Asfaw et al.; 2019); this implementation of GIRF is thus not tailored in any way to exploit special features of this toy model. The true log likelihood can be computed analytically via the Kalman Filter (KF) in this simple Gaussian case. PF works well for small values of in Fig. 1, but rapidly starts struggling as increases. GIRF behaves comparably to PF for small but its performance is maintained for larger . ASIF and ASIFIR have a small efficiency loss, for small , relative to PF due to the localization involved in the filter weighting, but for large this cost is more than paid back by the benefit of the reduced Monte Carlo variability. BIF has a larger efficiency loss for small , but its favorable scaling properties lead it to overtake all but ASIFIR for larger . This linear Gaussian SpatPOMP model provides a simple scenario to demonstrate scaling behavior. We now move on to a more complex SpatPOMP exemplifying the nonlinear models motivating our new filtering approach.
Spatiotemporal measles epidemics
Data analysis for spatiotemporal systems featuring nonlinear, nonstationary mechanisms and partial observability has been a longstanding open challenge for ecological and epidemiological analysis Bjørnstad and Grenfell (2001). A compartment modeling framework for spatiotemporal population dynamics divides the population at each spatial location into categories (called compartments) which are modeled as homogeneous. Spatiotemporal compartment models can be called patch models or metapopulation models in an ecological context. Particle filter methods based on targeted proposal distributions have been developed for spatiotemporal applications, but such methods suffer from the curse of dimensionality Snyder et al. (2015). The ensemble Kalman filter provides a stateoftheart approach to highdimensional spatiotemporal filtering, but the approximations inherent in the EnKF can be problematic for models that are highly nonlinear or nonGaussian Ades and Van Leeuwen (2015). Our island filter methodologies have theoretical guarantees for arbitrarily nonlinear and nonGaussian models, while having improved scaling properties compared to particle filters.
We consider a spatiotemporal model for disease transmission dynamics of measles within and between multiple cities, based on the model of Park and Ionides (2019) which adds spatial interaction to the compartment model presented by He et al. (2010). The model compartmentalizes the population of each city into susceptible (), exposed (), infectious (), and recovered/removed () categories. The number of individuals in each compartment city at time are denoted by , , , and . The population dynamics are written in terms of nondecreasing processes counting the cumulative transitions in city , up to time , between compartments identified by the subscripts. Our model is described by the following system of stochastic differential equations, for ,
Here, models recruitment into the susceptible population, and models emigration and death. When is integervalued, the system has step function solutions. The total population is calculated by smoothing census data and is treated as known. The number of recovered individuals in city is therefore defined implicitly. is modeled as negative binomial death processes Bretó et al. (2009); Bretó and Ionides (2011) with overdispersion parameter , and rate given by
(6)  
where models seasonality driven by high contact rates between children at school, described by
with being the proportion of the year taken up by the school terms, is the mean transmission rate, and measures the reduction of transmission during school holidays. In [6], is a mixing exponent modeling inhomogeneous contact rates within a city, and models immigration of infected individuals which is appropriate when analyzing a subset of cities that cannot be treated as a closed system. The number of travelers from city to is denoted by . Here, is constructed using the gravity model of Xia et al. (2004),
where denotes the distance between city and city , is the average population for city across time, is the average population across cities, and is the average distance between a randomly chosen pair of cities. Here, we model as fixed through time and symmetric between any two arbitrary cities, though a natural extension would allow for temporal variation and asymmetric movement between two cities. The transition processes , and are modeled as conditional Poisson processes with percapita rates , and respectively, and we fix . The birth process is an inhomogeneous Poisson processes with rate
, given by interpolated census data.
To complete the model specification, we must describe the measurement process. Let be the number of removed infected individuals in the th reporting interval. Suppose that cases are quarantined once they are identified, so that reported cases comprise a fraction of these removal events. The case report is modeled as a realization of a discretized conditionally Gaussian random variable , defined for via
(7)  
where is the cumulative distribution function, and
models overdispersion relative to the binomial distribution. For
, we replace by in [7].This model includes many features that have been proposed to be relevant for understanding measles transmission dynamics He et al. (2010). Our plugandplay methodology permits consideration of all these features, and readily extends to the investigation of further variations. Likelihoodbased inference via plugandplay methodology therefore provides a framework for evaluating which features of a dynamical model are critical for explaining the data King et al. (2008). By contrast, Xia et al. (2004) developed a linearization for a specific spatiotemporal measles model which is numerically convenient but not readily adaptable to assess alternative model choices.
We first assess the scaling properties of the filters on the measles model by evaluating the likelihood over varying numbers of units, , for fixed parameters. The results are given in Fig. 2, with additional information about timing, algorithmic choices, parameter values and a plot of the data provided in Sec. S6. In Fig. 2, the log likelihood per unit per time increases with because city size decreases with
. Smaller cities have fewer measles cases, resulting in a narrower and taller probability density function. Fig.
2 shows all three island filters performing well, compared to a rapid decline in performance of the particle filter (PF) starting at , and a slower decline for GIRF. The bias in the island filters is small in this example, as evidenced by their close match to PF for small numbers of units. This is likely because the coupling in this model is weak: a high proportion of measles transmission occurs within cities rather than between cities. The neighborhood used here for city at time was two previous lags, . This choice of neighborhood is explored in Sec. S7. This local neighborhood still involves other cities in forecast for city , since the forecast on each island involves a full coupled model.Fig. 3 evaluates the filter methods on the task of computing a slice of the likelihood function over the coupling parameter for simulated data. This slice varies while fixing the other parameters at the values used for the simulation. Scientifically, this gives us an upper bound on the identifiability of from such data, since the likelihood slice provides statistically efficient inference when all other parameters are known. In this case, the Monte Carlo error is comparable to the change in the log likelihood over a wide range of values of . Therefore, Monte Carlo error has substantial impact on inference even given considerable computational resources: Fig. 3 took 7 days to compute on 40 cores. Most transmission of measles is local, so evidently there is limited evidence available about strength of transmission between cities even from 40 jointly observed epidemic time series. By contrast, parameters of withincity transmission that are shared between cities should be well identified by carrying out inference on a collection of cities. As an example of this, Sec. S8 presents a likelihood slice for the recovery rate, .
Discussion
The pseudocode presented for the island filters describes how the outputs are calculated given the inputs, but does not prescribe details of how these quantities are calculated. There is scope for implementations to trade off memory, computation and communication by making differing decisions on how the various loops defined in the pseudocode are coded, such as decisions on memory overwriting and parallelization. This article focuses on the logical structure of the algorithms, leaving room for future research on implementationspecific considerations, though some supplementary discussion of memoryefficient implementation is given in Sec. S10.
This paper has focused on likelihood evaluation, a critical quantity for testing scientific hypotheses about model structure and the values of unknown parameters. However, our island filters also provide localized solutions to the filtering problem, using the local weights constructed for the adapted simulations constructed on each island. A solution to the filtering problem, in turn, provides a foundation for forecasting.
Plugandplay inference algorithms based on sequential Monte Carlo likelihood evaluation has proved successful for investigating highly nonlinear partially observed dynamic systems of low dimension arising in analysis of epidemiological and ecological time series data Bretó (2018); PonsSalort and Grassly (2018); de Cellès et al. (2018); Marino et al. (2018). A benefit of the plugandplay property is that it facilitates development of broadly applicable software, which in turn promotes creative scientific model development Bretó et al. (2009); He et al. (2010).
Geophysical data assimilation has similarities to inference on spatiotemporal biological systems. Relative to biological systems, geophysical applications are characterized by a greater number of spatial locations, better mathematical understanding of the underlying processes, and lower stochasticity. The locally weighted particle filter of Poterjoy (2016)
shares some features with our approach, but it builds on a moment approximation designed for geophysical models.
The algorithms BIF, ASIF, ASIFIR and GIRF used for this article all enjoy the plugandplay property. The numerical results for this paper use the asif, asifir and girf
functions via the opensource R package
spatPomp Asfaw et al. (2019) that provides a spatiotemporal extension of the R package pomp King et al. (2016). BIF was implemented using asif with particles per island. The pfilter and enkf functions were used from pomp. The source code for this paper will be contributed to an opensource scientific archive upon acceptance for publication.Acknowledgements: This work was supported by National Science Foundation grant DMS1761603 and National Institutes of Health grants 1U54GM111274 and 1U01GM110712.
References
 (1)
 Ades and Van Leeuwen (2015) Ades, M. and Van Leeuwen, P. J. (2015). The equivalentweights particle filter in a highdimensional system, Quarterly Journal of the Royal Meteorological Society 141(687): 484–503.

Andrieu et al. (2010)
Andrieu, C., Doucet, A. and Holenstein, R. (2010).
Particle Markov chain Monte Carlo methods,
Journal of the Royal Statistical Society, Series B (Statistical Methodology) 72: 269–342.  Asfaw et al. (2019) Asfaw, K., Ionides, E. L. and King, A. A. (2019). spatPomp: R package for statistical inference for spatiotemporal partially observed Markov processes, https://github.com/kidusasfaw/spatPomp .
 Bengtsson et al. (2008) Bengtsson, T., Bickel, P. and Li, B. (2008). Curseofdimensionality revisited: Collapse of the particle filter in very large scale systems, in T. Speed and D. Nolan (eds), Probability and Statistics: Essays in Honor of David A. Freedman, Institute of Mathematical Statistics, Beachwood, OH, pp. 316–334.
 Bjørnstad and Grenfell (2001) Bjørnstad, O. N. and Grenfell, B. T. (2001). Noisy clockwork: Time series analysis of population fluctuations in animals, Science 293: 638–643.
 Bretó (2018) Bretó, C. (2018). Modeling and inference for infectious disease dynamics: A likelihoodbased approach, Statistical Science 33(1): 57–69.
 Bretó et al. (2009) Bretó, C., He, D., Ionides, E. L. and King, A. A. (2009). Time series analysis via mechanistic models, Annals of Applied Statistics 3: 319–348.
 Bretó and Ionides (2011) Bretó, C. and Ionides, E. L. (2011). Compound Markov counting processes and their applications to modeling infinitesimally overdispersed systems, Stochastic Processes and their Applications 121: 2571–2591.

Crisan and Rozovskii (2011)
Crisan, D. and Rozovskii, B. (2011).
The Oxford handbook of nonlinear filtering
, Oxford University Press.  de Cellès et al. (2018) de Cellès, M. D., Magpantay, F. M., King, A. A. and Rohani, P. (2018). The impact of past vaccination coverage and immunity on pertussis resurgence, Science Translational Medicine 10(434): eaaj1748.
 Del Moral et al. (2017) Del Moral, P., Moulines, E., Olsson, J. and Vergé, C. (2017). Convergence properties of weighted particle islands with application to the double bootstrap algorithm, Stochastic Systems 6(2): 367–419.
 Del Moral and Murray (2015) Del Moral, P. and Murray, L. M. (2015). Sequential Monte Carlo with highly informative observations, Journal on Uncertainty Quantification 3: 969–997.
 Doucet et al. (2001) Doucet, A., de Freitas, N. and Gordon, N. J. (2001). Sequential Monte Carlo Methods in Practice, Springer, New York.
 Doucet and Johansen (2011) Doucet, A. and Johansen, A. (2011). A tutorial on particle filtering and smoothing: Fifteen years later, in D. Crisan and B. Rozovsky (eds), Oxford Handbook of Nonlinear Filtering, Oxford University Press.
 Ebert (2001) Ebert, E. E. (2001). Ability of a poor man’s ensemble to predict the probability and distribution of precipitation, Monthly Weather Review 129(10): 2461–2480.
 He et al. (2010) He, D., Ionides, E. L. and King, A. A. (2010). Plugandplay inference for disease dynamics: Measles in large and small towns as a case study, Journal of the Royal Society Interface 7: 271–283.
 Ionides et al. (2017) Ionides, E. L., Breto, C., Park, J., Smith, R. A. and King, A. A. (2017). Monte Carlo profile confidence intervals for dynamic systems, Journal of the Royal Society Interface 14: 1–10.
 Ionides et al. (2015) Ionides, E. L., Nguyen, D., Atchadé, Y., Stoev, S. and King, A. A. (2015). Inference for dynamic and latent variable models via iterated, perturbed Bayes maps, Proceedings of the National Academy of Sciences of the USA 112(3): 719––724.
 King et al. (2008) King, A. A., Ionides, E. L., Pascual, M. and Bouma, M. J. (2008). Inapparent infections and cholera dynamics, Nature 454: 877–880.
 King et al. (2016) King, A. A., Nguyen, D. and Ionides, E. L. (2016). Statistical inference for partially observed Markov processes via the R package pomp, Journal of Statistical Software 69: 1–43.
 Leutbecher and Palmer (2008) Leutbecher, M. and Palmer, T. N. (2008). Ensemble forecasting, Journal of Computational Physics 227(7): 3515–3539.
 Marino et al. (2018) Marino, J. A., Peacor, S. D., Bunnell, D. B., Vanderploeg, H. A., Pothoven, S. A., Elgin, A. K., Bence, J. R., Jiao, J. and Ionides, E. L. (2018). Evaluating consumptive and nonconsumptive predator effects on prey density using field times series data, Ecology .
 Palmer (2002) Palmer, T. N. (2002). The economic value of ensemble forecasts as a tool for risk assessment: From days to decades, Quarterly Journal of the Royal Meteorological Society 128(581): 747–774.
 Park and Ionides (2019) Park, J. and Ionides, E. L. (2019). Inference on highdimensional implicit dynamic models using a guided intermediate resampling filter, Arxiv:1708.08543v3 .
 Pitt and Shepard (1999) Pitt, M. K. and Shepard, N. (1999). Filtering via simulation: Auxillary particle filters, Journal of the American Statistical Association 94: 590–599.
 PonsSalort and Grassly (2018) PonsSalort, M. and Grassly, N. C. (2018). Serotypespecific immunity explains the incidence of diseases caused by human enteroviruses, Science 361(6404): 800–803.
 Poterjoy (2016) Poterjoy, J. (2016). A localized particle filter for highdimensional nonlinear systems, Monthly Weather Review 144(1): 59–76.
 Rebeschini and van Handel (2015) Rebeschini, P. and van Handel, R. (2015). Can local particle filters beat the curse of dimensionality?, The Annals of Applied Probability 25(5): 2809–2866.
 Snyder et al. (2015) Snyder, C., Bengtsson, T. and Morzfeld, M. (2015). Performance bounds for particle filters using the optimal proposal, Monthly Weather Review 143(11): 4750–4761.
 Vergé et al. (2013) Vergé, C., Dubarry, C., Del Moral, P. and Moulines, E. (2013). On parallel implementation of sequential Monte Carlo methods: The island particle model, Statistics and Computing 25(2): 243–260.
 Xia et al. (2004) Xia, Y., Bjørnstad, O. N. and Grenfell, B. T. (2004). Measles metapopulation dynamics: A gravity model for epidemiological coupling and dynamics, American Naturalist 164(2): 267–281.
Comments
There are no comments yet.