# Asymptotic genealogies of interacting particle systems with an application to sequential Monte Carlo

We consider weighted particle systems of fixed size, in which a new generation of particles is formed by resampling particles from the current generation with probabilities proportional to their weights. This set-up covers a broad class of sequential Monte Carlo (SMC) methods, which are widely-used in applied statistics and cognate disciplines across a range of domains. We consider the genealogical tree embedded into such particle systems by the resampling operation, and identify conditions, as well as an appropriate time-scaling, under which it converges to the Kingman coalescent in the infinite system size limit in the sense of finite-dimensional distributions. This makes the plethora of distributional results known for the Kingman coalescent available for analysis of SMC algorithms, which we illustrate by characterising the limiting mean and variance of the tree height, as well as of the total branch length of the tree. It also greatly improves the tractability of genealogies of SMC methods, which are known to be closely connected to the performance of these algorithms. The conditions which we require to prove convergence are strong, but we demonstrate by simulation that they do not appear to be necessary.

## Authors

• 6 publications
• 9 publications
• 13 publications
• 2 publications
06/30/2020

### Simple conditions for convergence of sequential Monte Carlo genealogies with applications

Sequential Monte Carlo algorithms are popular methods for approximating ...
10/11/2021

### Weak Convergence of Non-neutral Genealogies to Kingman's Coalescent

Interacting particle populations undergoing repeated mutation and fitnes...
07/03/2018

### Limit theorems for sequential MCMC methods

Sequential Monte Carlo (SMC) methods, also known as particle filters, co...
07/10/2014

### Asynchronous Anytime Sequential Monte Carlo

We introduce a new sequential Monte Carlo algorithm we call the particle...
09/30/2019

### Variance Estimation in Adaptive Sequential Monte Carlo

Sequential Monte Carlo (SMC) methods represent a classical set of techni...
06/15/2020

### On Effective Parallelization of Monte Carlo Tree Search

Despite its groundbreaking success in Go and computer games, Monte Carlo...
11/13/2015

### k-means: Fighting against Degeneracy in Sequential Monte Carlo with an Application to Tracking

For regular particle filter algorithm or Sequential Monte Carlo (SMC) me...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Interacting particle systems (IPSs) are a broad class of stochastic models for phenomena in disciplines including physics, engineering, biology and finance. Prominent examples are particle filters, particle methods and sequential Monte Carlo (SMC), which feature prominently in numerical approximation schemes for non-linear filtering, as well as mean field approximation of Feynman-Kac flows. For additional background we direct readers to [Del04], and [DJ11].

Central to these methods are discrete-time, evolving weighted particle systems. Correlations between particles arise out of resampling: a stochastic selection mechanism in which particles with high weight are typically replicated while those with low weight vanish, giving rise to an embedded genealogy. The contribution of this paper is to identify conditions under which these genealogies converge to the Kingman -coalescent [Kin82] in the infinite system size limit under an appropriate rescaling of time, as well as to describe ways in which information about the limiting genealogy can be used to characterize particle systems in applications.

Consider a sequence of measurable spaces , each associated to a Markov transition kernel , and a non-negative potential . These correspond to state spaces, transition kernels, and the importance weight function of our IPS, respectively.

Let be a weighted -particle system at time , where each , and the weights are non-negative and satisfy . Let be a resampling operator which acts on by assigning to each particle a random number of offspring. The total number of offspring is fixed at , and the mean number of offspring of particle is . All offspring are assigned an equal weight . More concretely

 Sζ(N)t={(N−1,X(a(i)t)t)}Ni=1,

where if in is the parent of in . Particles with low weight are randomly removed by having no offspring, while particles with high weight tend to have many offspring. Algorithm 2 gives an example.

The step from time to time is completed by propagating each particle in independently through the transition kernel to obtain particle locations . Finally, each particle is assigned a weight proportional to the potential evaluated at the locations of the particle and its parent, so that the full update is

 ζ(N)tS↦{N−1,X(a(i)t)t}Ni=1Kt+1↦{gt+1(X(a(i)t)t,X(i)t+1)∑Nj=1gt+1(X(a(j)t)t,X(j)t+1),X(i)t+1}Ni=1,

where . A specification is given in Algorithm 1.

There are many options for the resampling step Resample in line 6 of Algorithm 1. The simplest is multinomial resampling, given in Algorithm 2. It is analytically tractable, but suboptimal in terms of Monte Carlo variance. Popular alternatives include residual, stratified, and systematic resampling, which yield algorithms with lower variance [DCM05]. We prove our main theorem under multinomial resampling, and show by simulation that its conclusions also hold for these three alternatives, at least in our simple example. Other schemes which have been studied theoretically, but seem less widely used in applications, include the tree-based branching approximation of [CL97] and the McKean interpretation approach by [Del04, Section 2.5.3].

An example system of this form is particle filters: a class of IPSs for approximating integrals. For an overview, see e.g. [DJ11]

, and references therein. They are suited to settings in which expectations are computed with respect to the law of a latent, discrete-time Markov process conditioned on a sequence of observations, e.g. state space models. A state space model is a Markov chain

on with transition density and emission density , both with respect to dominating measures abusively denoted and , respectively. The spaces could be replaced with more general Polish spaces, but we focus on (subsets of) for concreteness. Typically, is an unobserved state at time , and is a noisy or incomplete observation of the state.

Let

be a vector of noisy, conditionally independent observations from the emission density

given an unobserved state trajectory . Functionals of the smoothing distribution,

 P(x|y)dx :=P(X∈dx|Y=y)∝T∏t=0p(xt−1,xt)ψ(xt,yt)dx0:T,

where we abuse notation and denote the law of by , are typically intractable, and often approximated by particle filters. The simplest is the algorithm introduced by [GSS93], usually known as the bootstrap particle filter, although the term “(interacting) particle filter” seems first to have been used by [Del96]. It fits into our IPS framework by taking

 Kt+1(xt,dxt+1) =p(xt,xt+1)dxt+1, gt+1(xt,xt+1) =ψ(xt+1,yt+1),

for a fixed observation sequence in Algorithm 1. Other particle filters can be considered by introducing a proposal density with for every and , and setting

 Kt+1(xt,dxt+1) =qt+1(xt,xt+1)dxt+1, gt+1(xt,xt+1) =p(xt,xt+1)ψ(xt+1,yt+1)qt+1(xt,xt+1).

Our IPS framework also includes more general SMC algorithms, such as SMC samplers [DDJ06] among others, see e.g. [Del04, Chapter 12].

There is a genealogy embedded in Algorithm 1. Consider at a fixed time . Tracing the ancestor indices backwards in time results in a coalescing forest of lineages. The forest forms a tree once the most recent common ancestor (MRCA) of all particles is reached, provided that happens before reaching the initial time 0. Our main result (Theorem 1) shows that, under certain conditions and an appropriate time-rescaling, functionals of this genealogy depending upon finite numbers of leaves converge to corresponding functionals of the Kingman -coalescent [Kin82] as , in the sense of finite dimensional distributions. Hence, the tractable Kingman -coalescent can be used to approximate functionals of SMC genealogies off-line, before the algorithm has been run. In particular, we show that the expected number of time steps from the leaves to the MRCA scales linearly in for any finite number of leaves. This compares to an upper bound of [JMR15] for the genealogy of all leaves. We also provide scaling expressions for the variance of the number of time steps to the MRCA (see Corollary 2). Our result applies to the marginal genealogical tree structure, marginalised over locations and weights of particles.

The conditions for convergence to the Kingman -coalescent can be satisfied by SMC algorithms under strong but standard conditions (18) and (19), used to control the fluctuations of family sizes. We expect that they can be relaxed, and simulations in Section 3 also suggest they are not necessary.

Mergers of lineages into common ancestors result in path degeneracy in SMC algorithms. Quantities of interest depending on times

will be estimated using fewer than

particles, resulting in high variance. Path degeneracy can be reduced by resampling less frequently, but this increases variance across time. The extreme case of no resampling results in no path degeneracy, but yields estimators whose variances increase exponentially in in most cases: a phenomenon known as weight degeneracy [KLW94]. Intermediate strategies balance these two difficulties [LC95, Section 4.2]. Our limiting genealogies are the only method of which we are aware that can provide a priori estimates of path degeneracy, and thus represent an important tool in minimising both degeneracies simultaneously. This is particularly important in the conditional SMC update in the particle Gibbs algorithm [ADH10], where retaining multiple trajectories across a fixed time window is essential.

Genealogies of SMC algorithms have found numerous applications, for example [CDG11, CL13, DM01, DG05, DM07, DMPR09, LW18, OD17]; see also [Del04, Section 3.4] and [Del13, Chapter 15]. Our explicit description of the limiting genealogical process is new for SMC, and has the potential to build on any of these works, as well as generate new ones in the IPS and SMC fields. We also demonstrate that non-Markovian prelimiting genealogies lie in the domain of attraction of Kingman’s -coalescent, in contrast with earlier results which assume Markovianity [Möh98, Theorem 1]. This extension is significant since reverse-time genealogies of Markov IPSs are not Markov processes in general.

The rest of this paper is structured as follows. In Section 2 we state and prove the convergence of genealogies of IPSs to the Kingman -coalescent. Section 3 presents a simulation showing that the scalings predicted by our convergence result hold for an example outside our technical assumptions, and Section 4 concludes with a discussion. An appendix contains some technical calculations. We conclude this section by summarising notation.

Let be the falling factorial. We adopt the conventions , . When a sum is written over a vector of indices, and that vector is of length 0, it should be interpreted as the identity operator; where this convention might hold, we emphasize it by writing . The statement (resp. ) means , (resp. ), and thus corresponds to the usual Landau big-O (resp. little-o) notation. For an integer , we define with , and for a finite set , we let denote the set of partitions of into at most non-empty blocks, with . For a partition , denotes the number of blocks in , and , where the length of the vector will be clear from context. For a vector , denotes the -norm.

## 2 The convergence theorem

It will be convenient to express our IPS in reverse time by denoting the initial time in Algorithm 1 by and the terminal time by 0, and to describe the genealogy in terms of a partition-valued family of processes indexed by , where denotes the number of observed leaves (time 0 particles) in a system with particles. The process is defined in terms of the underlying IPS via its initial condition , and its dynamics, which are driven by the requirement that belong to the same block in if leaves and have a common ancestor at time . Boundary problems will be avoided by ensuring that in our rescaled system as .

###### Remark 1.

Our genealogical process evolves on a space which tracks the ancestral relationships of the observed particles but not their states. The process is a projection of the time reversal of the historical process of [DM01], in which particle locations have been marginalised over. A consequence of this is that is not Markovian in general. Indeed, Markovianity fails even for the forward-time evolution of lineages after removing locations. Genealogical processes typically do track locations in the SMC literature, whereas our marginal formulation is standard in genetics [Möh98]. Our also coincides with the random ancestral forest of [DMPR09], who showed that the combinatorial structure of the reverse-time genealogy decouples from the particle locations in the case of neutral models (). Our contribution is to prove the same decoupling for suitably rescaled non-neutral models in the limit, and to identify the limiting process.

Genealogical processes are a powerful tool in population genetics, where the genealogical tree is viewed as missing data to be imputed, or an object of inference in its own right. A common large population limit is given by the Kingman

-coalescent [Kin82], which is also a partition-valued stochastic process evolving in reverse time. The initial condition is the singleton partition . Each pair of blocks then merges to a common ancestor independently at rate 1, forming a death process on the total number of blocks with rate when there are blocks. More formally, the generator of the Kingman -coalescent, , is the square matrix with a row and column corresponding to each partition of , and

 qξη=⎧⎪⎨⎪⎩1 if η can be obtained from ξ by merging two blocks,−(|ξ|2) if η=ξ,0 otherwise.

The dynamics terminate once the process hits the MRCA, i.e. the trivial partition . See [Wak09] for an introduction to coalescent processes and their use in population genetics.

Let denote the number of offspring that particle at time has at time . The following standing assumption will be central to our results.

Standing Assumption: The conditional distribution of parental indices given offspring counts, , is uniform over all vectors which satisfy for each .

###### Remark 2.

The Standing Assumption concerns the marginal distribution of parental assignments without particle locations. Conditionally uniform assignment given locations would be much stronger. A sufficient condition for the Standing Assumption is exchangeability of the Resample mechanism in line 6 of Algorithm 1. However, [Möh98, page 446] provides an example of a non-exchangeable particle system which still satisfies the Standing Assumption. For SMC, multinomial and residual resampling can be implemented in ways which satisfy the Standing Assumption, and any resampling scheme can be made to satisfy it by applying a uniformly sampled permutation to each realisation of ancestor indices. This technique was suggested in [ADH10, page 290].

Let and be partitions of with blocks ordered by the least element in each block, and with obtained from by merging some subsets of blocks. For , let be the number of blocks of that have been merged to form block in , so that , and define

 pξη(t):=1(N)|ξ|N∑i1≠…≠i|η|=1all distinct(ν(i1)t)b1…(ν(i|η|)t)b|η| (1)

as the conditional transition probability of the genealogical process from state at time to state at time , given (suppressed from the notation). The interpretation of (1) as a conditional transition probability of is justified by the Standing Assumption by associating offspring with balls, parents with boxes, and merger events with two or more balls occupying the same box. Finally, let denote the corresponding conditional transition probability matrix.

Now let the conditional probability (resp. upper bound on conditional probability) given of two (resp. more than two) lineages at time coalescing at time be denoted by

 cN(t) :=1(N)2N∑i=1(ν(i)t)2, DN(t) :=1N(N)2N∑i=1(ν(i)t)2(ν(i)t+1NN∑j≠i(ν(j)t)2).

The interpretations as (upper bounds on) conditional probabilities are justified by the Standing Assumption, and Lemma 1 below. For , let be the time change driven by the random offspring counts

 τN(t):=min{s≥1:s∑r=1cN(r)≥t}.
###### Remark 3.

The quantity will play the role of a merger rate between pairs of particles at time . For multinomial resampling,

 E[cN(t)|wt]=N∑i=1(w(i)t)2=1ESS(t)≈1N(1+Var(gt(Xt+1,Xt)E[gt(Xt+1,Xt)])),

where is the effective sample size of [KLW94], who also justify the approximation of the random left hand side by the deterministic right hand side. The indices of and are flipped due to the time reversal described at the top of this section. Thus the coalescence rate is high when the unnormalised importance weights have high relative variance, and vice versa.

###### Lemma 1.

Suppose that the Standing Assumption holds.

Case 1: For any partition we have the equality

 pξξ(t)=1−(|ξ|2)1+O(N−1)(N)2−(|ξ|2){1+O(N−1)}cN(t).

Case 2: Let be obtained from by merging exactly two blocks. Then

 cN(t)−(|ξ|−22){1+O(N−1)}DN(t)≤pξη(t)≤{1+O(N−1)}cN(t).

Case 3: For any obtained from by one or more mergers involving more than two blocks in total, we have

 pξη(t)≤(|ξ|−22){1+O(N−1)}DN(t).
###### Proof.

We begin by proving Case 1 by setting in (1). A multinomial expansion in reverse yields

 pξξ(t) =1(N)|ξ|{(N∑i=1ν(i)t)|ξ|−(|ξ|2)N∑i=1(ν(i)t)2(N∑j=1ν(j)t)|ξ|−2} =1+(|ξ|2)1+O(N−1)N−(|ξ|2)1+O(N−1)(N)2N∑i=1(ν(i)t)2 =1−(|ξ|2)1+O(N−1)(N)2−(|ξ|2)1+O(N−1)(N)2N∑i=1(ν(i)t)2,

where the second and third equalities follow from , and from

 Nj(N)j=1+(j2)1N+O(N−2). (2)

The proof of Case 2 is essentially identical to calculations in [Möh98, pages 442–443], and is omitted.

For Case 3 we have

 pξη(t) =1(N)|ξ|N∑i=1(ν(i)t)2{(N∑j=1ν(j)t)|ξ|−2−N∑j1≠…≠j|ξ|−2≠iall distinctν(j1)t…ν(j|ξ|−2)t} =1(N)|ξ|N∑i=1(ν(i)t)2{N|ξ|−2−(N∑j≠iν(j)t)|ξ|−2+(|ξ|−22)N∑j≠i(ν(j)t)2(∑k≠iν(k)t)|ξ|−4}

where we have used the Bernoulli inequality in the last step. The result follows from (2). ∎

###### Theorem 1.

Fix as the observed number of particles from the output of an IPS with particles, and suppose that the Standing Assumption holds. Suppose also that for any , we have

 limN→∞E[τN(t)∑r=τN(s)+1DN(r)]=0, (3) limN→∞E[cN(t)]=0, (4) limN→∞E[τN(t)∑r=τN(s)+1cN(r)2]=0, (5) andE[τN(t)−τN(s)]≤Ct,sN, (6)

for some constant that is independent of . Then converges to the Kingman -coalescent in the sense of finite-dimensional distributions as .

###### Remark 4.

While unbiasedness of resampling, i.e. , is part of the definition of our IPS, it is not required for Theorem 1

. The key ingredients are the symmetry in the Standing Assumption, and the control over moments of orders up to four implicit in (

3) – (6). Loosely, condition (4) implies that converges to a continuous-time process, and together with (5) ensures that binary mergers happen at the required unit rate. Condition (3) ensures that mergers involving more than two lineages happen on a slower timescale than binary mergers, and (6) controls the speed with which the convergence in (4) takes place.

###### Proof of Theorem 1.

For and , the finite-dimensional distributions of the process have the form

 P(G(n,N)τN(t1)=η1,…,G(n,N)τN(tk)=ηk|G(n,N)τN(t0)=η0) =E[k∏d=1{τN(td)∏t=τN(td−1)+1PN(t)}ηd−1,ηd]=:E[k∏d=1χ∗d],

where is a sequence of partitions in which the blocks of are obtained by merging some subsets of blocks of , or , and where is the conditional transition matrix given family sizes defined below (1). The probability associated with any other sequence of partitions is zero. We will prove the result by bounding these finite-dimensional distributions both above and below by those of a Kingman -coalescent.

Consider a transition between and at respective times and . The corresponding conditional transition probability given offspring counts can be written

 χ∗d=∑ξ∈ηd−1⇝ηdτN(td)∏t=τN(td−1)+1pξt−1ξt(t),

where the sum on the right hand side is over all paths from to of the requisite length, , where each successive element of either equals its predecessor, or is obtained from its predecessor by merging some subsets of blocks. By decomposing based on (the number of times between and in which mergers occur), as well as based on whether each merger involves exactly two lineages or more than two lineages, we can use Lemma 1, Cases 2 and 3 to upper bound the conditional transition probability given family sizes by

 χ∗d ≤|ηd−1|−|ηd|∑α=1(1+O(N−1))α∑(λ,μ)∈Π2([α])

and if , where is an upper bound on the number of arrangements of at most lineages into at most mergers. A further expansion shows that the product of transition probabilities has the following bound when for each :

 limN→∞E[k∏d=1χ∗d] ≤nn2k(n2)nk|η0|−|η1|∑α1=1…|ηk−1|−|ηk|∑αk=1∑(λ1,μ1)∈Π2([α1])… ≤∑(λk,μk)∈Π2([αk])limN→∞E[τN(t1)∑⋆s(1)1<…

here, bounds using . Transitions in which any result in a similar bound in which the corresponding factors on the right-hand side of (7) are replaced by 1. Next, a multinomial expansion in reverse gives

 τN(td)∑⋆s1<…

(where we take ), so that by definition of

 1α!τN(td)∑⋆s1≠…≠sα=τN(td−1)+1all distinctα∏r=1cN(sr) ≤1α!τN(td)∑⋆s1,…,sα=τN(td−1)+1α∏r=1cN(sr) ≤[td−td−1+cN(τN(td))]αα!≤(td−td−1+1)αα!, (9)

because . Suppose that in (7) for some . Using , which is clear from

 1N(ν(i)t+1NN∑j≠i(ν(j)t)2)≤1,

for all but one -factor in (7), and substituting in (9), gives

 limN→∞E[k∏d=1χ∗d] ≤nn2k(n2)nk|η0|−|η1|∑α1=1…|ηk−1|−|ηk|∑αk=1∑(λ1,μ1)∈Π2([α1])… ≤∑(λk,μk)∈Π2([αk]){k∏d=1(td−td−1+1)|αd|−δdi(αd−δdi)!}limN→∞E[τN(ti)∑s=τN(ti−1)+1DN(s)],

where is the Kronecker delta. As for (7), cases for which some are 0 can be handled by a straightforward modification. All sums outside the expectation consist of a bounded number of terms in , so (3) guarantees that the contribution of paths with vanishes in the limit and only isolated, binary mergers can take place.

To describe transitions in which the only mergers that occur are isolated, binary mergers, we define to be the matrix obtained from by setting its diagonal entries to 0. Note that is precisely the number of ways of going from to in exactly steps, where a step consists of merging a pair of blocks.

Now consider a transition from to at respective times and via binary mergers, i.e. with and . By Lemma 1, Cases 1 and 2, its conditional probability is bounded by

 χd≤ (~Qα)ηd−1ηd(1+O(N−1))ατN(td)∑⋆s1<…

where is the restriction of to trajectories involving only isolated binary mergers. An expansion of the product on the second line gives

 τN(td)∏r=τN(td−1)+1r≠s1,…,r≠sα{1−(|ηd−1|−|{i:si