# Coupled conditional backward sampling particle filter

We consider the coupled conditional backward sampling particle filter (CCBPF) algorithm, which is a practically implementable coupling of two conditional backward sampling particle filter (CBPF) updates with different reference trajectories. We find that the algorithm is stable, in the sense that with fixed number of particles, the coupling time in terms of iterations increases only linearly with respect to the time horizon under a general (strong mixing) condition. This result implies a convergence bound for the iterated CBPF, without requiring the number of particles to grow as a function of time horizon. This complements the earlier findings in the literature for conditional particle filters, which assume the number of particles to grow (super)linearly in terms of the time horizon. We then consider unbiased estimators of smoothing functionals using CCBPF, and also the coupled conditional particle filter without backward sampling (CCPF) as suggested by Jacob, Lindsten and Schon [arXiv:1701.02002]. In addition to our results on the CCBPF, we provide quantitative bounds on the (one-shot) coupling of CCPF, which is shown to be well-behaved with a finite time horizon and bounded potentials, when the number of particles is increased.

• 9 publications
• 17 publications
• 10 publications
06/08/2022

### Entropic Convergence of Random Batch Methods for Interacting Particle Diffusion

We propose a co-variance corrected random batch method for interacting p...
05/27/2022

### Conditional particle filters with bridge backward sampling

The performance of the conditional particle filter (CPF) with backward s...
04/21/2019

### Particle filter efficiency under limited communication

Sequential Monte Carlo (SMC) methods are typically not straightforward t...
01/24/2022

### The Coupled Rejection Sampler

We propose a coupled rejection-sampling method for sampling from couplin...
08/01/2021

### Fast and numerically stable particle-based online additive smoothing: the AdaSmooth algorithm

We present a novel sequential Monte Carlo approach to online smoothing o...
07/03/2022

### On the complexity of backward smoothing algorithms

In the context of state-space models, backward smoothing algorithms rely...
03/20/2020

### BetheSF: Efficient computation of the exact tagged-particle propagator in single-file systems via the Bethe eigenspectrum

Single-file diffusion is a paradigm for strongly correlated classical st...

## 1. Introduction

The conditional particle filter (CPF) introduced by Andrieu et al. (2010)

is a Markov Chain Monte Carlo method that produces asymptotically unbiased samples from the posterior distribution of the states of a hidden Markov model. The CPF can be made significantly more efficient by the inclusion of backward sampling

(Whiteley, 2010) (or equivalently ancestor sampling (Lindsten et al., 2014)

steps) which we refer to as the conditional backward sampling particle filter (CBPF). While there are many empirical studies reporting on the effectiveness of the CBPF for Bayesian inference and on its the superiority over CPF

(see, e.g., Fearnhead and Künsch, 2018, Section 7.2.2), quantitative theoretical guarantees for the CBPF are still missing. In contrast, the theoretical properties of the CPF are much better understood (Chopin and Singh, 2015; Lindsten et al., 2015; Andrieu et al., 2018).

Chopin and Singh (2015) introduced a coupling construction, called the coupled CPF (CCPF), to prove the uniform ergodicity of the CPF. Recently, Jacob et al. (to appear) identified the potential use of the CCPF to produce unbiased estimators by exploiting a de-biasing technique due to Glynn and Rhee (2014) (see also Jacob et al., 2017). This is an important algorithmic advancement to particle filtering methodology since unbiased estimation is useful for estimating confidence intervals, allows straightforward parallelisation, and when used within a stochastic approximation context, e.g. the stochastic approximation expectation maximisation (SAEM) scheme, unbiased estimators ensure martingale noise, which has good supporting theory.

The main contribution of this paper is to propose a relatively simple yet important algorithmic modification to the CCPF for unbiased estimation by extending the CCPF to include backward sampling steps through an index-coupled version of Whiteley’s (2010) backward sampling CPF. This approach, which we call the coupled conditional backward sampling particle filter (CCBPF), appears to be far more stable than the CCPF (Chopin and Singh, 2015) (and Lindsten et al. (2014) variant that uses coupled ancestor sampling within the CCPF.) Under a general (but strong) mixing condition, we prove (Theorem 7) that the coupling time of CCBPF grows at most linearly with length of the data record when a fixed number of particles are used, provided this fixed number is sufficiently large. As an important corollary, we obtain new convergence guarantees for the CBPF (Theorem 4) that verifies its superiority over the CPF. This result differs from the time-uniform guarantees for the CPF (Andrieu et al., 2018; Lindsten et al., 2015) which require (super)linear growth of the number of particles. Our result confirms the long held view, stemming from numerous empirical studies, that the CBPF remains an effective sampler with a fixed number of particles even as the data record length increases. An important consequence of a fixed number of particles is that the the space complexity of the algorithm is linear, as opposed to quadratic, in the length of the data record, making it feasible to run on long data records without exhausting the memory available on a computer. We remark that another version of the CPF which is stable with a fixed number of particles is the blocked version of the CPF introduced in (Singh et al., 2017).

We also complement the empirical findings of Jacob et al. (to appear)

by showing quantitative bounds on the ‘one-shot’ coupling probability of CCPF. These results are noteworthy as we believe the CCPF’s coupling probability does diminish with the length of the time series

unless the particle number is also increased. With the minimal assumption of bounded potentials, we prove (Theorem 5) that the coupling probability of CCPF is at least , similar to what shown for the CPF (Andrieu et al., 2018; Lindsten et al., 2015). The constants involved grow very rapidly with in the absence of the usual stringent mixing condition on the underlying model. When the stringent mixing conditions do hold, we are able to give a more favourable rate of convergence as increases (Theorem 6), which still requires a increasing number of particles with .

## 2. Notation and preliminaries

Throughout the paper, we assume a general state space , which is typically equipped with the Lebesgue measure. However, our results hold for any measure space equipped with a -finite dominating measure, which is denoted as ‘’. Product spaces are equipped with the related product measures. We use the notation for any integers , and use similar notation in indexing and . We also use combined indexing, such that for instance . We adopt the usual conventions concerning empty products and sums, namely and when . We denote , and .

We use standard notation for the -step transition probability of a Markov kernel by and . If is a probability measure and is a real-valued function, then , and , whenever well-defined. The total variation metric between two probability measures is defined as , and

. If two random variables

and share a common law, we write .

We are interested in computing expectations of smoothing functionals, with respect to the probability density on a space with the following unnormalised density (cf. Del Moral, 2004):

 (1) γT(x1:T):=M1(x1)G1(x1)T∏t=2Mt(xt−1,xt)Gt(xt−1:t),

where is a probability density, are Markov transition densities, and for are ‘potential functions,’ and is an unknown normalising constant. In the context of hidden Markov models, the potentials are often taken to be of the form

 G1(x1)=gt(y1∣x1)f1(x1)M1(x1)%andGt(xt−1,xt)=gt(yt∣xt)ft(xt∣xt−1)Mt(xt−1,xt),

in which case corresponds to the smoothing distribution of the hidden Markov model conditional on observations .

We will consider two different conditions for the model. The first is generally regarded as non-restrictive in the particle filtering literature and essentially equivalent with the uniform ergodicity of CPF (Andrieu et al., 2018).

###### Assumption 1.

(Bounded potentials)
There exists such that for all .

The second is a much stronger assumption, again typical in the particle filtering literature, when proving time-uniform error bounds of particle filtering estimate (cf. Del Moral, 2004; Del Moral et al., 2010).

###### Assumption 2.

(Strong mixing)
and , and for all ,

1. and ,

2. and .

Denote and .

###### Remark 3.

The expression of constant may be simplified (and improved) in two special cases, as follows:

1. If for all , then may be omitted.

2. If for , then may be omitted.

In particular, if both hold, then .

## 3. Convergence of the conditional backward sampling particle filter

Before going to the construction of the coupled conditional particle filters, we formalise the important implication of our result for the convergence time of the conditional backward sampling particle filter (CBPF) (Whiteley, 2010) or its ancestor sampling implementation (Lindsten et al., 2014), which are probabilistically equivalent, and reversible with respect to (Chopin and Singh, 2015).

###### Theorem 4.

Suppose Assumption 2 (strong mixing) holds, and denote by the Markov transition probability of CBPF with and particles. For any , there exists , such that for all :

1. for all and all .

2. For any , as .

###### Proof.

The upper bound (i) follows from Theorem 7 and Lemma 27, and (ii) follows directly from (i). ∎

Theorem 4, indicates that under the strong mixing assumption, the mixing time of CBPF increases at most linearly in the number of observations . We remark that unlike existing results on the CPF, we do not derive a one-shot coupling bound (Chopin and Singh, 2015), or a one-step minorisation measure (Andrieu et al., 2018; Lindsten et al., 2015), to prove the uniform ergodicity of the CBPF transition probability . This is because the enhanced stability of CBPF’s Markov kernel over the Markov kernel of CPF can only be established by considering the behaviour of the iterated kernel of Theorem 4, which has thus far proven elusive to study. Thus, in addition to the result, the proof technique is itself novel and of interest. For this reason we dedicate Section 5 to its exposition. Finally, even though and in Theorem 4 may be taken arbitrarily small and large, respectively, by increasing , the rate at which is required to increase in our results with respect to and is fast and far more conservative than what has been observed in practice.

## 4. Coupled conditional particle filters and unbiased estimators

This section is devoted to the CCPF and CCBPF algorithms (in short CCxPF where x is a place holder), and the construction of unbiased estimators of using them. We start with Algorithm 1, where the CCxPF algorithms are given in pseudo-code. The algorithms differ only in lines 1217, highlighting the small, but important, difference of CCPF and CCBPF. The CCBPF incorporates index coupled backward sampling, which is central to our results. Algorithm 2 details the index coupled resampling (Chopin and Singh, 2015) employed within the methods. Line 7 of Algorithm 1 accomodates any sampling strategy which satisfies and marginally, but may involve dependence, such as implemented using common random number generators (Jacob et al., to appear).

The CCxPF algorithms define Markov transition probabilities on . It is direct to check that CCPF and CCBPF coincide marginally with the CPF (Andrieu et al., 2010) and CBPF (Whiteley, 2010) algorithms, respectively. That is, if for some , then and , where CxPF stand for either the CPF or the CBPF updates with the corresponding reference trajectories. It is also clear that if , then . Because CPF and CBPF are both -reversible (Chopin and Singh, 2015), it is easy to see that CCxPF are -reversible, where .

Let us first state a result that complements the findings of Jacob et al. (to appear). It implies that CCPF enjoys similar strong uniform ergodicity like CPF, with the same rate as the number of particles is increased (cf. Andrieu et al., 2018; Lindsten et al., 2015).

###### Theorem 5.

Let , and consider with . If Assumption 1 holds, then there exists a constant such that

 P(S=~S)≥1−cN+c.

Proof of Theorem 5 is given in Appendix A.

Theorem 5 is stated with a fixed time horizon , and shows that one-shot coupling occurs from any initial state with positive probability for any . To have a reasonably large probability of one-shot coupling, it is sufficient to choose a large enough value of . The coupling is one-shot since it compares and which are the outputs of a single application of the CCPF algorithm. The concern is that if is large, may have to be taken very large in order to guarantee some desired minimum coupling probability. Indeed, we have only been able to show the following result:

###### Theorem 6.

Under the setting of Theorem 5, but with Assumption 17 in Appendix B,

 P(S=~S)≥1−2TTN−12c∗+1.

Theorem 6, which follows from Lemma 20, shows that the probability of coupling does not diminish when . That is, roughly doubling of particle number with every unit increase in ensures non-diminishing coupling probability.

In our experiments, the CCBPF indicated stable behaviour with a fixed and small number of particles, even for a large . Our main result, which we state next, consolidates our empirical findings. In contrast to Theorem 5, the statement of the coupling behaviour for CCBPF is not one-shot in nature. In our analysis we show that the pair of trajectories output by the repeated application of the CCBPF kernel couple themselves progressively, starting from their time components until eventually coupling all their components until time . For this reason the result is stated in terms of the law of this coupling time and not as in Theorem 5.

###### Theorem 7.

Suppose that Assumption 2 holds. let and let and for . Denote the coupling time .

For any , there exists such that for all ,

 (2) P(τ≥n)≤αTβ−n,for all n,T∈N.

In particular, for any , as .

The proof of the bound (2) is given in Section 5, and the linear coupling time statement follows trivially. The most striking element of this statement is that the coupling time does not exceed with greater surety as increases.

Let us then turn to the use of CCxPF together with the scheme of Glynn and Rhee (2014), as suggested in (Jacob et al., to appear).

Algorithm 3 has two adjustable parameters, a ‘burn-in’ and ‘number of particles’ which may be tuned to maximise its efficiency. Algorithm 3 iterates either the coupled conditional particle filter CCPF or the coupled conditional backward sampling particle filter CCBPF until a perfect coupling of the trajectories and is obtained.

The following result records general conditions under which the scheme above produces unbiased finite variance estimators.

###### Theorem 8.

Suppose Assumption 1 holds and is bounded and measurable. Then, Algorithm 3 with CCPF, and , denoting by the running time (iterations before producing output).

1. almost surely.

2. and .

3. With the constant of Theorem 5,

 E[τ] |var(Z)−varπT(h(X))| ≤16∥¯h∥2∞(N+cN)2(cN+c)b/2

where

###### Proof.

Theorem 5 implies that for all , from which

 E[τ]=∑k≥0P(τ>k)≤b+∑k≥bP(τ>k)≤b+(cN+c)b−1(N+cN)<∞,

and the bound on follows from Lemma 28. Part (ii) follows from Theorem 24 and Lemma 26. ∎

Theorem 8 complements the consistency result of Jacob et al. (to appear), by quantifying the convergence rates. Fix : if is large, then , and if is large, then . As mentioned before the growth of the constant with respect to can be very rapid. In contrast, in case of the CCBPF, the results may be refined as follows:

###### Theorem 9.

Suppose Assumption 2 holds, let and let be from Theorem 7. Then, Algorithm 3 satisfies, with :

1. .

2. .
In particular, if with any , as .

###### Proof.

The results follow from Theorem 7 and Lemma 28, similarly as in the proof of Theorem 8. ∎

Note that the latter term in Theorem 9 (i) is at most , showing that the expected coupling time is linear in . Theorem 9 (ii) may be interpreted so that the CCBPF algorithm is almost equivalent with perfect sampling from , when increased (super)linearly with respect to .

We conclude the section with a number of remarks regarding Algorithm 3:

1. We follow Jacob et al. (to appear) and suggest an initalisation based on a standard particle filter in line 1. However, this initialisation may be changed to any other scheme, which ensures that and have identical distributions. Our results above do not depend on the chosen initialisation strategy.

2. The estimator is constructed for a single function , but several estimators may be constructed simultaneously for a number of functions . In fact, as Glynn and Rhee (2014) note, if we let , we may regard the random signed measure

 ^μb(⋅):=δSb(⋅)+∑τk=b+1[δSk(⋅)−δ~Sk(⋅)]

as the output, which will satisfy the unbiasedness at least for all bounded measurable .

3. It is also possible to construct a ‘time-averaged’ estimator that corresponds to an average of the estimators over a range of values for (Jacob et al., 2017).

4. We believe that the method is valid also without Assumption 1 but may exhibit poor performance — similar to the conditional particle filter, which is sub-geometrically ergodic with unbounded potentials (Andrieu et al., 2018).

## 5. Coupling time of CCBPF

Consider now the Markov chain defined by Algorihm 3, with the stopping criterion (line 5) omitted. Define the ‘perfect coupling boundary’ as

 κn:=κ(Sn,~Sn):=max{t≥0:Sn,1:t=~Sn,1:t},

We are interested in upper bounding the stopping time .

Since the CCBPF is complicated, in our analysis we instead focus on a simplified Markov chain that considers only the vector of numbers of identical particles at each time

. The boundary associated with this simpler chain grows by i.i.d. positive mean increments, which are stochastically ordered with respect to the increments of the CCBPF boundary increments, ultimately allowing us to upper bound the stopping time.

We use stochastic ordering of two random variables and , which holds if their distribution functions are ordered for all . Two random vectors and are ordered if for all functions for which the expectations exist, and which are increasing, in the sense that whenever , where ‘’ is the usual partial order if for all . Recall also that if and only if there exists a probability space with random variables and such that and and a.s. (Shaked and Shanthikumar, 2007, Theorem 6.B.1).

Our bound of is based on an independent random variable , which satisfies , under Assumption 2.

###### Lemma 10.

Suppose Assumption 2 holds, and consider the output of Algorithm 1 (CCBPF). The perfect coupling boundaries satisfy

 κ(X(J1:T)1:T,~X(~J1:T)1:T)−κ(X∗1:T,~X∗1:T)≥stΔ∧(T−κ(X∗1:T,~X∗1:T)),

where the random variable is defined through the following procedure:

1. Let for and , and set . While :

• Simulate .

• Let .

2. Set for , and . While or :

• Simulate , where

 pt:=⎧⎪⎨⎪⎩p(0)t:=ϵ^CtN,ξt+1=0p(1)t:=^Ctϵ^Ctϵ+N−^Ct,ξt+1=1.
• Let .

3. Set .

###### Proof.

Denote in short , and the indices of the coupled particles

 Ct:={j∈{1:N}:X(I(j)t)t=~X(~I(j)t)t}fort∈{1:T}.

Then, the sizes of satisfy the following:

 |Ct| =N t=1:κ, |Ct|∣∣C1:t−1 ≥stBinom(N−1,δ|Ct−1|δ|Ct−1|+N−|Ct−1|) t=(κ+1):T,

where the latter follows by Lemma 11 (ii). As the function is increasing in , and for , it follows that (Shaked and Shanthikumar, 2007, Theorem 6.B.3). This means that we may construct (by a suitable coupling) such that for all .

By Lemma 11, the backward sampling indices satisfy:

 P(JT=~JT∈CT∣C1:T) ≥δ|CT|δ|CT|+N−|CT|, P(Jt=~Jt∈Ct∣C1:T,Jt+1:T) ≥⎧⎪⎨⎪⎩ϵ|Ct|ϵ|Ct|+N−|Ct|,Jt+1=~Jt+1∈Ct+1,ϵ|Ct|N,otherwise,

for . By definition, , and therefore . This, together with implies that . Similarly, by (Shaked and Shanthikumar, 2007, Theorem 6.B.3), we deduce that

 (I{J1=~J1∈C1},…,I{JT=~JT∈CT})≥stξ(1−κ):(T−κ).

Because the functions are increasing, the claim follows. ∎

###### Lemma 11.

Suppose and for . Let

 ε:=ω∗ω∗andC:={j∈{1:N}:ω(j)=~ω(j)}.

Then, satisfy the following for all :

1. for all ,

2. .

###### Proof.

Note that , so the first bound is immediate. For the second, let , and observe that

 ∑j∈Cw(j)∧~w(j) ≥|C|ω∗|C|ω∗+|Cc|ω∗,

because is increasing for for any . The last bound equals (ii). ∎

Because , we note that , where

 (3) ^τ:=inf{n≥0:n∑k=1Δk≥T},

and are independent realisations of in Lemma 10. The next lemma indicates that if is large enough (given ), the random variables are well-behaved, and ensure good expectation and tail probability bounds for .

###### Lemma 12.

Given any , consider the random variable defined in Lemma 10. For any and , there exists such that for all