## I Introduction

Effective and efficient on-line learning of high-dimensional states is an important task in many domains where we need to regularly update our knowledge by processing a deluge of streaming data. Relevant applications span from robotic learning to financial modelling, from multi-target tracking to weather forecasting [1, 2, 3, 4]. In the presence of non-linear states space models, particle filters [5, 6] are one of the standard tools for sequential inference. However, it is in general difficult to construct effective proposal distributions to match well with the posterior distribution in high-dimensional spaces. A mismatch between the proposal and the posterior leads to negligible importance weights for most particles. These are normally discarded after resampling. This weight degeneracy issue thus leads to poor particle representation of the posterior and has limited the widespread application of particle filters in high-dimensional filtering scenarios [7, 8, 9].

Advanced particle filtering methods have been proposed to approximate the optimal proposal distribution [10] in order to alleviate the weight degeneracy issue. The multiple particle filter [11] partitions the state space into multiple lower-dimensional spaces where state estimation is performed. The block particle filter takes a similar approach by partitioning the state space and the measurement space into independent blocks in updating the filtering distribution, although this introduces a bias that is difficult to quantify [12]. In [13], Beskos et al. introduced an unbiased space-time particle filter which also relies on factorization of the conditional posterior. These algorithms are promising but are applicable only in scenarios where the factorization can be performed. The equivalent weights particle filter sacrifices the statistical consistency to ensure substantial weights for a large number of particles [14]. Particle flow filters have been proposed to address high-dimensional filtering by transporting particles continuously from the prior to the posterior [15, 16, 17, 18]

. Most particle flow filters do not provide statistically consistent estimates of the posterior. Deterministic particle flow filters such as those we build on in this paper often underestimate the variance. Other versions incorporate approximation errors in the implementation or impose overly strong model assumptions

[19, 20]. Recent stochastic particle flow filters such as those described in [18] address these limitations. As an alternative approach, [21, 22, 23, 24] propose the use of particle flow or transport maps to construct a proposal distribution that approximates the posterior and then use importance sampling to correct for any discrepancy.Another direction to combat weight degeneracy in high-dimensional filtering is to use Markov chain Monte Carlo (MCMC) methods to improve the diversity of samples. MCMC is often considered as the most effective method for sampling from a high-dimensional distribution [25]. More effective MCMC techniques in high-dimensional spaces use Hamiltonian or Langevin dynamics to construct efficient proposals [26, 27, 28, 29]. Although effective, these techniques cannot be directly used in sequential inference tasks, as MCMC typically targets a static distribution. To use MCMC to diversify samples in a sequential setting, the resample-move particle filter incorporates MCMC methods by performing MCMC moves after the resampling step of the particle filter [30]. Unfortunately, the resampling step can lead to degeneracy, so many MCMC moves may be required to diversify particles.

A more general framework called sequential Markov chain Monte Carlo (SMCMC) uses MCMC techniques to sample directly from the approximate target distribution [31, 32, 33, 34]. [31] directly targets the filtering distribution which is computationally expensive. [32, 33, 34] instead propose to sample from the joint state distribution to avoid the numerical integration of the predictive density. In the algorithms presented in [32, 33, 34], a fixed number of samples is used to approximate the empirical posterior distribution for each time step. By contrast, in the sequentially interacting MCMC (SIMCMC) framework described in [35], one can continue to generate interacting non-Markovian samples of the entire state sequence to improve the empirical approximation of joint posterior distribution successively. The resulting samples are asymptotically distributed according to the joint posterior distribution. The fundamental difference between the SMCMC and the SIMCMC techniques is that the SMCMC algorithm consists of sequential implementation of static MCMC schemes (justifying the name “sequential MCMC”), whereas this interpretation does not hold for the SIMCMC algorithm. Hence the analysis of SIMCMC in [35] cannot be applied to SMCMC (and vice-versa) and SMCMC cannot be expressed as a special case of SIMCMC. From a practical viewpoint, if a fixed number of particles is to be used, the effect of error in approximating the posterior distribution at the previous time step might be severe for the SIMCMC algorithm and limit its applicability in high dimensional online filtering problems compared to advanced SMC or SMCMC techniques.

A variety of MCMC kernels developed for sampling in high-dimensional spaces can be used inside the SMCMC framework. [34] and this paper differ from the past literature by using particle flow to construct MCMC kernels in the SMCMC framework.

In this paper, we propose to incorporate invertible particle flow [24] into the SMCMC framework. Our main contributions are three-fold: 1) we exploit the capability of particle flow to migrate samples into high posterior density regions to construct a composite Metropolis-Hasting (MH) kernel that significantly increases the acceptance rate of the joint draw; 2) for multi-modal distributions, we incorporate a Gaussian mixture model-based particle flow to improve sampling efficiency; 3) we assess the performance of the proposed methods through numerical simulation of challenging high-dimensional filtering examples. We presented preliminary results concerning an initial attempt to incorporate SMCMC with invertible particle flow in [34]. This paper provides a more detailed description of the proposed approach, presents more computationally efficient algorithms, and proposes a sequential MCMC method with mixture model-based flow for efficient exploration of multi-modal distributions. We also provide theoretical results regarding asymptotic convergence of the algorithms. These results are not restricted to the invertible particle flow composite kernel case, but hold for SMCMC methods provided the kernel and filtering problem satisfy certain assumptions (see Section IV-C). While similar in spirit to results presented in [35] for the SIMCMC algorithm, the key assumptions are slightly less restrictive and the theorem directly addresses the sequential implementation in [33] and this work. Recently, [36] carries out a rigorous statistical analysis of SMCMC algorithms to establish upper bounds on finite sample filter errors, in addition to providing asymptotic convergence results.

The rest of the paper is organized as follows. Section II provides the problem statement and Section III reviews the sequential MCMC framework and the invertible particle flow. We present the proposed methods in Section IV. Simulation examples and results are presented in Section V. We conclude the paper in Section VI.

## Ii Problem Statement

Many online data analysis applications involve estimating unknown quantities, which we call the “state” , given sequential measurements. Often there is prior knowledge related to the state , where the subscript indicates the time step before any measurement arrives. A dynamic model describes the evolution of state at time step given the past states and a measurement model captures the relationship between observations and the state .

The hidden Markov model (HMM) is a ubiquitous tool for modelling the discrete-time dynamic system and measurement process. It is assumed that the hidden state

follows the Markov property, i.e., it is independent of all states before given the state . The measurement is modelled as independent of all past measurements and past states conditioned on the current state . We can model the state evolution and measurements with the following HMM:(1) | ||||

(2) | ||||

(3) |

Here

is the initial probability density function of the state

, models the dynamics of the unobserved state , and the measurement model describes the relation between the measurement and the state . We assume that is a function, i.e., is a differentiable function whose first derivative is continuous. and are the process noise and the measurement noise, respectively. With these models, the filtering task is to compute the posterior distribution of the state trajectory online, as new data become available. For conciseness, we use to denote the set and to denote the set , where and are integers and . The posterior distribution gives a probabilistic interpretation of the state given all measurements, and can be used for state estimation and detection.## Iii Background Material

### Iii-a Sequential Markov chain Monte Carlo methods

Sequential Markov chain Monte Carlo (SMCMC) methods were proposed as a general MCMC approach for approximating the joint posterior distribution recursively. A unifying framework of the various SMCMC methods was provided in [33]. At time step , can be computed pointwise up to a constant in a recursive manner:

(4) |

As is not analytically tractable, it is impossible to sample from it in a general HMM. In all SMCMC methods, the distribution is replaced by its empirical approximation in (4), which leads to an approximation of as follows:

(5) |

where

(6) |

Here is the Dirac delta function centred at , is the number of samples discarded during a burn-in period, and is the number of retained MCMC samples. are the samples obtained from the Markov chain at time , whose stationary distribution is . At time step , iterations of the Metropolis-Hastings (MH) algorithm with proposal are executed to generate samples from the invariant distribution , and is approximated as:

(7) |

The purpose of the joint draw of is to avoid numerical integration of the predictive density when the target distribution is [32]. Note that if we are only interested in approximating the marginal posterior distribution , only needs to be stored instead of the full past state trajectories . The Metropolis-Hastings (MH) algorithm used within SMCMC to generate one sample is summarized in Algorithm 1.

#### Iii-A1 Composite MH kernel in SMCMC

Different choices of the MCMC kernel for high dimensional SMCMC are discussed in [33]. In most SMCMC algorithms, an independent MH kernel is adopted [33], i.e., , meaning that the proposal is independent of the state of the Markov chain at the previous iteration. The ideal choice is the optimal independent MH kernel, but it is usually impossible to sample from [33]. It is difficult to identify an effective approximation to the optimal independent MH kernel. The choice of independent MH kernel using the prior as the proposal can lead to very low acceptance rates if the state dimension is very high or the measurements are highly informative.

[32, 33] propose the use of a composite MH kernel which is constituted of a joint
proposal (to update ) followed by two individual
state variable refinements using proposals (to update
) and (to update ), based on the *Metropolis within Gibbs* approach,
within a single MCMC iteration. The composite kernel is summarized in
Algorithm 2.

Any of the MCMC kernels mentioned before can be used in the joint draw step of a composite kernel. For example, the independent MH kernel based on the prior as the proposal is used in the joint draw step of the implementation of the sequential manifold Hamiltonian Monte Carlo (SmHMC) algorithm in [33]. For individual refinement of , [33] uses the independent proposal , which leads to the following simplification of MH acceptance rate in Line 5 of Algorithm 1, using Equation (5).

(8) |

The aim of the refinement steps is to explore the neighbourhood of samples generated in the joint draw step. For the MCMC kernel of the individual refinement step of , Langevin diffusion or Hamiltonian dynamics have been proposed to more efficiently traverse a high-dimensional space [33]. The manifold Hamiltonian Monte Carlo (mHMC) kernel of the individual refinement step of the SmHMC algorithm efficiently samples from the target filtering distribution, making the SmHMC algorithm one of the most effective algorithms for filtering in high-dimensional spaces.

### Iii-B Particle flow particle filter

#### Iii-B1 Particle flow filter

In the last decade, a new class of Monte Carlo-based filters has emerged that shows promise in high-dimensional filtering. In a given time step , particle flow algorithms [15, 19] migrate particles from the predictive distribution to the posterior distribution

, via a “flow” that is specified through a partial differential equation (PDE). There is no sampling (or resampling). Thus the weight degeneracy issue is avoided.

A particle flow can be modelled by a background stochastic process in a pseudo-time interval , such that the distribution of is the predictive distribution and the distribution of is the posterior distribution .

In [19], the underlying stochastic process

satisfies an ordinary differential equation (ODE) with zero diffusion:

(9) |

When the predictive distribution and the likelihood are both Gaussian, the exact flow for the linear Gaussian model is:

(10) |

where

(11) | ||||

(12) |

Here is the mean of the predictive distribution, is the covariance matrix of prediction error for the prior distribution, is the new measurement, is the measurement matrix, i.e. , and is the covariance matrix of the measurement noise. We refer to this method as the exact Daum-Huang (EDH) filter, and a detailed description of its implementation is provided in [37]. For nonlinear models, a computationally intensive variation of EDH, that computes a separate flow for each particle by performing linearization at the particle location , was proposed in [16] and is referred to as the localized exact Daum-Huang (LEDH) filter.

Numerical integration is normally used to solve the ODE in Equation (9). The integral between and for , where and , is approximated and the Euler update rule for the EDH flow becomes

(13) |

where the step size and .

#### Iii-B2 Particle flow particle filter

Because of the discretization errors made while numerically solving Eq (9), and the mismatch of modelling assumptions between a general HMM and a linear Gaussian setup (which was assumed in deriving Equations (11) and (12)), the migrated particles after the particle flow process are not exactly distributed according to the posterior. Instead can be viewed as being drawn from a proposal distribution , which is possibly well matched to the posterior, because of the flow procedure. It is shown in [24] that for the EDH, if an auxiliary flow is performed starting from the mean of the predictive distribution , and the generated flow parameters are used to perform particle flow for each particle , then in presence of the assumed smoothness condition on the measurement function and with sufficiently small step sizes , the auxiliary particle flow establishes an invertible mapping . We can straightforwardly compute the proposal density as follows:

(14) |

is the Jacobian determinant of the mapping function and denotes the absolute value. The determinant of is given as:

(15) |

#### Iii-B3 Particle flow particle filter with Gaussian mixture model assumptions

When the state space models involve Gaussian mixture model (GMM) noise, Equations (2) and (3) admit the following structures and .

An alternative representation of the GMM allows an equivalent formulation of the dynamic model by introducing a latent scalar variable with a probability mass function (PMF) . The variable specifies the GMM component that excites the dynamic model, i.e., . The state transition density can then be described as:

(16) |

Similarly, a latent variable with the PMF specifies the GMM component that generates the measurement noise. The likelihood can be described as follows:

(17) |

In order to construct the particle flow particle filter for this model, at time step , first is sampled with probability and is sampled with probability . Conditioned on , an invertible particle flow based on LEDH is constructed using the -th and -th Gaussian components of the dynamic and measurement models respectively, to sample . The importance weights of the joint state can be updated as follows [38]:

(18) |

where the proposal density is computed by:

(19) |

## Iv Methods

In this section, we propose to use the invertible particle flow to approximate the optimal independent MH kernel in the sequential MCMC methods for both uni-modal and multi-modal target distributions.

### Iv-a SMCMC with invertible particle flow

To construct MH kernels based on invertible particle flow, we first develop a new formulation of the invertible mapping with particle flow.

#### Iv-A1 New formulation of the invertible mapping with particle flow

The particle flow particle filters (PF-PFs) construct invertible particle flows in a pseudo-time interval in order to move particles drawn from the prior distribution into regions where the posterior density is high.

Using the Euler update rule specified in Equation (13) recursively over , the invertible mapping for the PF-PF (EDH) can be expressed as:

(20) |

where

(21) |

and

(22) |

In [24] it is shown that the equivalent mapping is invertible with sufficiently small , so the matrix is invertible. The procedure to produce and is summarized in Algorithm 3 and the proposal density becomes

(23) |

Similarly, for the PF-PF (LEDH), the invertible mapping can be expressed as

(24) |

where . In this expression, where is the dynamic model introduced in Equation (2). The proposal density becomes

(25) |

#### Iv-A2 SmHMC with LEDH

One of the composite MH kernels we propose uses the invertible particle flow based on the LEDH flow and is presented in Algorithm 4.

In the -th MCMC iteration at time step , we first sample from the approximate joint posterior distribution at time . Then, we calculate to obtain the auxiliary LEDH flow parameters , and apply the flow to the propagated particle . Thus the proposed particle is generated as: .

For this proposal, the acceptance rate of the joint draw in Algorithm 4 can be derived using Equations (5) and (25):

(26) |

where we use to denote the minimum operator.

When evaluating Equation (26) in Line 5 of Algorithm 4, the values of , and are needed. Since may be generated by the manifold Hamiltonian Monte Carlo kernel as shown in Algorithm 5, the corresponding is not available through Lines 2 and 6 of Algorithm 4. This can be resolved using the invertible mapping property of the invertible particle flow. As is invertible, we can calculate given by solving Equation (24):

(27) |

#### Iv-A3 SmHMC with EDH

Calculation of individual flow parameters at every MCMC iteration in Algorithm 4 can be computationally expensive. Similar to the spirit of the PF-PF (EDH) [24], we can calculate the flow parameters and only once, using an auxiliary state variable derived from the samples, and apply the calculated flow parameters for all MCMC iterations. The resulting procedure is described in Algorithm 6. The flow parameters and are calculated only once in the initialization of each time step , as in Algorithm 7.