Double Bayesian Smoothing as Message Passing

by   Pasquale Di Viesti, et al.

Recently, a novel method for developing filtering algorithms, based on the interconnection of two Bayesian filters and called double Bayesian filtering, has been proposed. In this manuscript we show that the same conceptual approach can be exploited to devise a new smoothing method, called double Bayesian smoothing. A double Bayesian smoother combines a double Bayesian filter, employed in its forward pass, with the interconnection of two backward information filters used in its backward pass. As a specific application of our general method, a detailed derivation of double Bayesian smoothing algorithms for conditionally linear Gaussian systems is illustrated. Numerical results for two specific dynamic systems evidence that these algorithms can achieve a better complexity-accuracy tradeoff and tracking capability than other smoothing techniques recently appeared in the literature.



There are no comments yet.


page 1

page 2

page 3

page 4


A New Smoothing Technique based on the Parallel Concatenation of Forward/Backward Bayesian Filters: Turbo Smoothing

Recently, a novel method for developing filtering algorithms, based on t...

On Approximate Nonlinear Gaussian Message Passing On Factor Graphs

Factor graphs have recently gained increasing attention as a unified fra...

Multiple Bayesian Filtering as Message Passing

In this manuscript, a general method for deriving filtering algorithms t...

Joint Target Detection, Tracking and Classification with Forward-Backward PHD Smoothing

Forward-backward Probability Hypothesis Density (PHD) smoothing is an ef...

LineSmooth: An Analytical Framework for Evaluating the Effectiveness of Smoothing Techniques on Line Charts

We present a comprehensive framework for evaluating line chart smoothing...

Answering Hindsight Queries with Lifted Dynamic Junction Trees

The lifted dynamic junction tree algorithm (LDJT) efficiently answers fi...

Can automated smoothing significantly improve benchmark time series classification algorithms?

tl;dr: no, it cannot, at least not on average on the standard archive pr...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I. Introduction

The problem of Bayesian smoothing for a state space model

(SSM) concerns the development of recursive algorithms able to estimate the

probability density function (pdf) of the model state on a given observation interval, given a batch of noisy measurements acquired over it [1], [2]; the estimated pdf is known as a smoothed or smoothing pdf. Two general methods are available in the literature for recursively calculating smoothing densities; they are known as the forward filtering-backward smoothing recursion (e.g., see [3] and [4]) and the method based on the two-filter smoothing formula (e.g., see [5] and [6]). Both methods are based on the idea that the smoothing densities can be computed by combining the predicted and/or filtered densities generated by a Bayesian filtering method with the statistical information produced in the backward pass by a different filtering method; the latter method is paired with the first one and, in the case of the two-filter smoothing formula, is known as backward information filtering (BIF). Unluckily, closed form solutions for Bayesian smoothing are available for linear Gaussian and linear Gaussian mixture models only [1, 2, 7]

. This has motivated the development of various methods based on approximating smoothing densities in different ways. For instance, the use of Gaussian approximations for the smoothing densities and of sigma points techniques for solving moment matching integrals has been investigated in

[8, 9, 10]. Another class of methods (usually known as particle smoothers) is based on the exploitation of sequential Monte Carlo techniques, i.e. on approximating smoothing densities through a set of weighted particles (e.g., see [3, 11, 5, 12, 13, 14] and references therein). Recently, substantial attention has been also paid to the development of smoothing algorithms for the class of conditionally linear Gaussian SSMs [15, 16, 17, 18, 19]. In this case, the above mentioned approximate methods can benefit from the so called Rao-Blackwellization technique, i.e. from the marginalisation of the linear substructure of any conditionally linear Gaussian model; this can significantly reduce the overall computational complexity of both sigma-point based Gaussian smoothing [15] and particle smoothing [16, 17, 18, 19] (that is usually known as Rao-Blackwellized particle smoothing, RBPS, in this case).

In this manuscript, we propose a novel general method for the development of computationally efficient particle smoothers. Our method exploits the same conceptual approach illustrated in [20] in the context of Bayesian filtering and dubbed multiple Bayesian filtering. That approach is based on the idea of developing new filtering algorithms by: a) interconnecting multiple heterogeneous Bayesian filters; b) representing the processing accomplished by each Bayesian filter and the exchange of statistical information among distinct filters as a message passing over a proper factor graph. In [20] the exploitation of this approach has been investigated in detail for the case in which two Bayesian filters are interconnected, i.e. dual Bayesian filtering (DBF) is employed. Moreover, it has been shown that accurate and computationally efficient DBF algorithms can be devised if the considered SSM is conditionally linear Gaussian. In this manuscript, we show that, if DBF is employed in the forward pass of a smoothing method, a BIF method, paired with DBF and based on the interconnection of two backward information filters can be devised by following some simple rules. Similarly as DBF, our derivation of such a BIF method, called double backward information filtering (DBIF), is based on a graphical model. Such a graphical model allows us to show that: a) the pdfs computed in DBIF can be represented as messages passed on it; b) all the expressions of the passed messages can be derived by applying the same rule, namely the so called sum-product algorithm (SPA) [21], [22], to it; c) iterative algorithms can be developed in a natural fashion once the cycles it contains have been identified and the order according to which messages are passed on them (i.e., the message scheduling) has been established; d) the statistical information generated by a DBIF algorithm in the backward pass can be easily merged with those produced by its paired DBF technique in the forward pass in order to evaluate the required smoothed pdfs. To exemplify the usefulness of the resulting smoothing method, based on the combination of DBF and DBIF, and called double Bayesian smoothing (DBS), the two DBF algorithms proposed in [20] for the class of conditionally linear Gaussian SSMs are taken into consideration, and the BITF algorithm paired with each of them and a simplified version of it are derived. This leads to the development of four new DBS algorithms, two generating an estimate of the joint smoothing density over the whole observation interval, the other two an estimate of the marginal smoothing densities over the same interval. Our computer simulations for two specific conditionally linear Gaussian SSMs evidence that, in the first case, the derived DBS algorithms perform very closely to the RBPS technique proposed in [18] and to the particle smoothers devised in [19], but at lower computational cost and time. In the second case, instead, two of the devised DBS techniques represent the only technically useful options, thanks to their good tracking capability. In fact, such techniques are able to operate reliably even when their competitors diverge in the forward pass.

It is worth stressing that the technical contribution provided by this manuscript represents a significant advancement with respect to the application of factor graph theory to particle smoothing illustrated in [19]. In fact, in that manuscript, we also focus on conditionally linear Gaussian models, but assume that the forward pass is accomplished by marginalized particle filtering (MPF; also known as Rao-Blackwellized particle filtering); in other words, Bayesian filtering is based on the interconnection of a particle filter with a bank of Kalman filters. In this manuscript, instead, the general method we propose applies to a couple of arbitrary interconnected Bayesian filters. Moreover, the specific smoothing algorithms we derive assume that the forward pass is carried out by a filtering algorithm based on the interconnection of a particle filter with a single extended Kalman filter.

The remaining part of this manuscript is organized as follows. In Section II., a general graphical model, on which the processing accomplished in DBF, DBIF, and DBS is based, is illustrated. In Section III., a specific instance of the graphical model illustrated in the previous section is developed under the assumptions that the filters employed in the forward pass are an extended Kalman filter and a particle filter, and that the considered SSM is conditionally linear Gaussian. Then, the scheduling and the computation of the messages passed over this model are analysed in detail and new DBS algorithms are devised. The differences and similarities between these algorithms and other known smoothing techniques are analysed in Section IV.. A comparison, in terms of accuracy, computational complexity, and execution time, between the proposed techniques and three smoothers recently appeared in the literature, is provided in Section V. for two conditionally linear Gaussian SSMs. Finally, some conclusions are offered in Section VI..

Notations: The same notation as refs. [19, 20] and [23] is adopted.

Ii. Graphical Model for a Couple of Interconnected Bayesian Information Filters and Message Passing on it

In this manuscript, we consider a discrete-time SSM whose dimensional hidden state in the th interval is denoted , and whose state update and measurement models are expressed by




respectively, with , , , . Here, () is a time-varying dimensional (dimensional) real function, is the duration of the observation interval and () is the th element of the process (measurement) noise sequence (); this sequence consists of dimensional (dimensional) independent and identically distributed

(iid) Gaussian noise vectors, each characterized by a zero mean and a covariance matrix

(). Moreover, statistical independence between and is assumed.

From a statistical viewpoint, a complete statistical description of the considered SSM is provided by the pdf of its initial state, its Markov model and its observation model for any ; the first pdf is assumed to be known, whereas the last two pdfs can be easily derived from Eq. (1) and Eq. (2), respectively.

In the following, we focus on the problem of developing novel smoothing algorithms and, in particular, algorithms for the estimation of the joint smoothed pdf (problem P.1) and the sequence of marginal smoothed pdfs  (problem P.2); here, is a dimensional vector. Note that, in principle, once problem P.1 is solved, problem P.2 can be easily tackled; in fact, if the joint pdf is known, all the posterior pdfs can be evaluated by marginalization.

The development of our smoothing algorithms is mainly based on the graphical approach illustrated in our previous manuscripts [19, Sec. III], [20, Sec. II] and [23, Sec. III] for Bayesian filtering and smoothing. This approach consists in the following steps:

1. The state vector is partitioned in two substates, denoted and and having sizes and , respectively. Note that, if represents the portion of not included in (with and ), our assumptions entail that and .

2. A sub-graph that allows to represent both Bayesian filtering and BIF for the substate (with and ) as message passing algorithms on it is developed, under the assumption that the complementary substate is statistically known. This means that filtered and predicted densities of are represented as messages passed on the edges of this sub-graph and the rules for computing them result from the application of the SPA to it.

3. The two sub-graphs devised in the previous step (one referring to , the other one to ) are interconnected, so that a single graphical model referring to the whole state is obtained.

4. Algorithms for Bayesian filtering and BIF for the whole state are derived by applying the SPA to the graphical model obtained in the previous step.

Let us analyse now the steps 2.-4. in more detail. As far as step 2. is concerned, the sub-graph devised for the substate is based on the same principles illustrated in our manuscripts cited above (in particular, ref. [20]) and is illustrated in Fig. 1. The th recursion (with ) of Bayesian filtering for the sub-state is represented as a forward message passing on this factor graph, that involves the Markov model and the observation model . This allows to compute the messages , and , that convey the first filtered pdf of , the second filtered pdf of and the predicted pdf of , respectively, on the basis of the messages , and ; the last three messages represent the predicted pdf of evaluated in the previous (i.e., in the th) recursion of Bayesian filtering, and the messages conveying the measurement and the pseudo-measurement information, respectively, available in the recursion. The considered filtering algorithm requires the availability of the messages , , , that are computed on the basis of external statistical information. The presence of the messages and is due the fact that the substate represents a nuisance state for the considered filtering algorithm; in fact, these messages convey filtered (or predicted) pdfs of and are employed to integrate out the dependence of the pdfs and , respectively, on . Note also that these two messages are not necessarily equal, since more refined information about could become available after that the message has been computed. On the other hand, the message conveys the statistical information provided by a pseudo-measurement111Generally speaking, a pseudo-measurement is a fictitious measurement that is computed on the basis of statistical information provided by a filtering algorithm different from the one benefiting from it. about . In Fig. 1, following [20, Sec. II], it is assumed that the pseudo-measurement is available in the estimation of and that represents the pdf of conditioned on , that is


The computation of the messages , and on the basis of the messages , , , and is based on the two simple rules illustrated in [23, Figs. 8-a) and 8-b), p. 1535] and can be summarized as follows. The first and second filtered pdfs (i.e., the first and the second forward estimates) of are evaluated as




respectively, where


and is defined in Eq. (3). Equations (4)-(6) describe the processing accomplished in the measurement update of the considered recursion. This is followed by the time update, in which the new predicted pdf (i.e., the new forward prediction)


is computed. The message passing procedure described above is initialised by setting (where is the pdf resulting from the marginalization of with respect to ) in the first recursion and is run for . Once this procedure is over, BIF is executed for the substate ; its th recursion (with ) can be represented as a backward message passing on the factor graph shown in Fig. 1. In this case, the messages , , , that convey the backward predicted pdf of , the first backward filtered pdf of and the second backward filtered pdf of , respectively, are evaluated on the basis of the messages , and , respectively; note that represents the backward filtered pdf of computed in the previous (i.e., in the th) recursion of BIF. Moreover, the first and second backward filtered pdfs of are evaluated as (see Fig. 1)




respectively, where and are still expressed by Eq. (3) and Eq. (6), respectively. The BIF message passing is initialised by setting in its first recursion and is run for . Once the backward pass is over, a solution to problem P.2 becomes available for the substate , since the marginal smoothed pdf (where is the dimensional vector resulting from the ordered concatenation of the all the observed pseudo-measurements ) can be evaluated as222Note that, similarly as refs. [19] and [23], a joint smoothed pdf is considered here in place of the corresponding posterior pdf.


with . Note that, from a graphical viewpoint, formulas (10)-(12) can be related with the three different partitionings of the graph shown in Fig. 1 (where a specific partitioning is identified by a brown dashed vertical line cutting the graph in two parts).

Figure 1: Factor graph involved in the th (th) recursion of Bayesian filtering (BIF) for the substate and forward (backward) message passing on it. The flow of messages in the forward (backward) pass are indicated by red (blue) arrows, respectively; the brown vertical lines cutting each graph identify the partitioning associated with formulas (10) (left cut), (11) (central cut) and (12) (right cut). The messages , , , , , , , and are denoted , , , , , , , and respectively, to ease reading.

Given the graphical model represented in Fig. 1, step 3. can be accomplished by adopting the same conceptual approach as [19, Sec. III] and [20, Par. II-B], where the factor graphs on which smoothing and filtering, respectively, are based are obtained by merging two sub-graphs, each referring to a distinct substate. For this reason, in this case, the graphical model for the whole state is obtained by interconnecting two distinct factor graphs, each structured like the one shown in Fig. 1. In [20, Par. II-B], message passing on the resulting graph is described in detail for the case of Bayesian filtering. In this manuscript, instead, our analysis of message passing concerns BIF and smoothing only. The devised graph and the messages passed on it are shown in Fig. 2. Note that, in developing our graphical model, it has been assumed that the smoothed pdf referring to (and conveyed by the message ) is computed on the basis of Eq. (10), i.e. by merging the messages and . Moreover, the following elements (identified by brown lines) have been added to its th sub-graph (with and ): a) two equality nodes; b) the block BIFBIF for extracting useful information from the messages computed on the th sub-graph and delivered to the th one. The former elements allow the th backward information filter to generate copies of the messages and , that are made available to the other sub-graphs. In the latter element, instead, the messages (see Eq. (3)) and (with and ; see Eqs. (6) and (7)) are computed; note that this block is connected to oriented edges only, i.e. to edges on which the flow of messages is unidirectional.

Figure 2: Graphical model based on the sub-graph shown in Fig. 1 and referring to the interconnection of two backward information filters. The message computed in the backward (forward) pass are identified by blue (black) arrows. The message is denoted to ease reading.

Given the graphical model represented in Fig. 2, step 4. can be easily accomplished. In fact, recursive BIF and smoothing algorithms can be derived by systematically applying the SPA to it after that a proper scheduling has been established for message passing. In doing so, we must always keep in mind that:
1) Message passing on the th subgraph represents BIF/smoothing for the substate ; the exchange of messages between the sub-graphs, instead, allows us to represent the interaction of two interconnected BIF/smoothing algorithms in a effective and rigorous way.
2) Different approximations can be used for the predicted/filtered/smoothed pdfs computed in the message passing on each of the two sub-graphs and for the involved Markov/observation models. For this reason, generally speaking, the two interconnected filtering/BIF/smoothing algorithms are not required to be of the same type.
3) The th recursion of the overall BIF algorithm is fed by the backward estimates () and (), and generates the new backward predictions () and (), and the two couples of filtered densities , , , (, , , ). Moreover, merging the predicted densities computed in the forward pass (i.e., the messages ) with the second backward filtered densities (i.e., the messages ) allows us to generate the smoothed pdfs for each substate according to Eq. (10). However, a joint filtered/smoothed density for the whole state is unavailable.
4) Specific algorithms are employed to compute the pseudo-measurement and the nuisance substate pdfs in the BIFBIF blocks appearing in Fig. 2. These algorithms depend on the considered SSM and on the selected message scheduling; for this reason, a general description of their structure cannot be provided.
5) The graphical model shown in Fig. 2, unlike the one illustrated in Fig. 1, is not cycle free. The presence of cycles raises the problems of identifying all the messages that can be iteratively refined and establishing the order according to which they are computed. Generally speaking, iterative message passing on the devised graphical model involves both the couple of measurement updates and the backward prediction accomplished in each of the interconnected backward information filters. In fact, this should allow each filter to progressively refine the nuisance substate density employed in its second measurement update and backward prediction, and improve the quality of the pseudo-measurements exploited in its first measurement update. For this reason, if iterations are run, the overall computational complexity of each recursion is multiplied by .

The final important issue about the graphical model devised for both Bayesian filtering and BIF concerns the possible presence of redundancy. In all the considerations illustrated above, disjoint substates and have been assumed. Actually, in ref. [20], it has been shown that our graphical approach can be also employed if the substates and cover , but do not necessarily form a partition of it. In other words, some overlapping between these two substates is admitted. When this occurs, the forward/backward filtering algorithm run over the whole graphical model contains a form of redundancy, since elements of the state vector are independently estimated by the interconnected forward/backward filters. The parameter can be considered as the degree of redundancy characterizing the filtering/smoothing algorithm. Moreover, in ref. [20], it has been shown that the presence of redundancy in a Bayesian filtering algorithm can significantly enhance its tracking capability (i.e., reduce its probability of divergence); however, this result is obtained at the price of an increased complexity with respect to the case in which the interconnected filters are run over disjoint substates.

Iii. Double Backward Information Filtering and Smoothing Algorithms for Conditionally Linear Gaussian State Space Models

In this section we focus on the development of two new DBS algorithms for conditionally linear Gaussian models. We first describe the graphical models on which these algorithms are based; then, we provide a detailed description of the computed messages and their scheduling in a specific case.

A. Graphical Modelling

In this paragraph, we focus on a specific instance of the graphical model illustrated in Fig. 2, since we make the same specific choices as ref. [20] for both the considered SSM and the two Bayesian filters employed in the forward pass. For this reason, we assume that: a) the SSM described by eqs. (1)-(2) is conditionally linear Gaussian [18], [23], [24], so that its state vector can be partitioned into its linear component and its nonlinear component (having sizes and , respectively, with ); b) the dual Bayesian filter employed in the forward pass results from the interconnection of an extended Kalman filter with a particle filter333In particular, a sequential importance resampling filter is employed [25]. (these filters are denoted F and F, respectively), as described in detail in ref. [20]. As far as the last point is concerned, it is also worth mentioning that, on the one hand, filter F estimates the nonlinear state component only (so that and ) and approximates the predicted/filtered densities of this component through a set of weighted particles. On the other hand, filter F employs a Gaussian approximation of all its predicted/filtered densities, and works on the whole system state or on the linear state component. In the first case (denoted C.1 in the following), we have that and is empty, so that both F and F estimate the nonlinear state component (for this reason, the corresponding degree of redundancy in the developed smoothing algorithm is ); in the second case (denoted C.2 in the following), instead, and , so that filters F and F estimate disjoint substates (consequently, ).

Our selection of the forward filtering scheme has the following implications on the developed DBIF scheme. The first backward information filter (denoted BIF) is the backward filter associated with an extended Kalman filter operating over on the whole system state (case C.1) or on the linear state component (case C.2). The second backward filter (denoted BIF), instead, is a backward filter associated with a particle filter operating on the nonlinear state component only. In practice, following [18, 19, 17], BIF is employed to update the weights of all the elements of the particle set generated by filter F in the forward pass. Then, based on the graphical model shown in Fig. 2, the factor graph illustrated in Fig. 3 can be drawn for case C.1. It is important to point out that:

1) The first backward information filter (BIF) is based on linearised (and, consequently, approximate) Markov/measurement models, whereas the second one (BIF) relies on exact models, as explained in more detail below. These models are the same as those employed in ref. [20].

2) Since the nuisance substate is empty, no marginalization is required in BIF; for this reason, the messages ; (i.e., and ) visible in Fig. 2 do not appear in Fig. 3. Moreover, the message is generated on the basis of Eq. (11), instead of Eq. (10).

3) The backward filtered pdf and the smoothed pdf (i.e., the messages and , respectively) feed the BIFBIF block, where they are processed jointly to generate the pseudo-measurement message () feeding filter F. Similarly, the backward filtered pdf () and the smoothed pdf () feed the BIFBIF block, where the pseudo-measurement message () and the messages ; (i.e., and ) are evaluated.

In the remaining part of this paragraph, we first provide various details about the backward filters BIF and BIF, and the way pseudo-measurements are computed for each of them; then, we comment on how the factor graph shown in Fig. 3 should be modified if case C.2 is considered.

BIF - This backward filter is based on the linearized versions of Eqs. (1) and (2), i.e. on the models (e.g., see [1, pp. 194-195] and [20, Par. III-A])




respectively; here, , , , and () is the forward prediction (forward estimate) of computed by F in its th (th) recursion. Consequently, the approximate models




appear in the graphical model shown in Fig. 3.

BIF - In developing this backward filter, the state vector is represented as the ordered concatenation of its linear component , and its nonlinear component . Based on [23, eq. (3)], the Markov model


is adopted for the nonlinear state component (this model corresponds to the last lines of Eq. (1)); here, () is a time-varying dimensional real function ( real matrix) and consists of the last elements of the noise term appearing in Eq. (1) (the covariance matrix of is denoted ). Moreover, it is assumed that the observation model (2) can be put in the form (see [20, eq. (31)] or [23, eq. (4)])


where () is a time-varying dimensional real function ( real matrix). Consequently, the considered backward filter is based on the exact pdfs



both appearing in the graphical model drawn in Fig. 3.

Computation of the pseudo-measurements for the first backward filter - Filter BIF is fed by pseudo-measurement information about the whole state . The method for computing these information is similar to the one illustrated in ref. [19, Sects. III-IV] and can be summarised as follows. The pseudo-measurements about the nonlinear state component are represented by the particles conveyed by the smoothed pdf (). On the other hand, pseudo-measurements about the linear state component are evaluated by means of the same method employed by marginalized particle filtering (MPF) for this task. This method is based on the idea that the random vector (see Eq. (17))


depending on the nonlinear state component only, must equal the sum


that depends on the linear state component. For this reason, realizations of (21) are computed in the BIFBIF block on the basis of the messages () and , and are treated as measurements about .

Computation of the pseudo-measurements for the second backward filter - The messages () and () feeding the BIFBIF block are employed for: a) generating the messages ; required to integrate out the dependence of the state update and measurement models (i.e., of the densities , (LABEL:f_x_N) and (20), respectively) on the substate ; b) generating pseudo-measurement information about . As far as the last point is concerned, the approach we adopt is the same as that developed for dual marginalized particle filtering (dual MPF) in ref. [23, Sec. V] and also adopted in particle smoothing [19, Sects. III-IV]. The approach relies on the Markov model


referring to the linear state component (see [19, eq. (1)] or [23, eq. (3)]); in the last expression, () is a time-varying dimensional real function ( real matrix), and consists of the first elements of the noise term appearing in (1) (the covariance matrix of is denoted , and independence between and is assumed for simplicity). From Eq. (23) it is easily inferred that the random vector


must equal the sum


that depends on only; for this reason, (24) can be interpreted as a pseudo-measurement about . In this case, the pseudo-measurement information is conveyed by the message () that expresses the correlation between the pdf of the random vector (24) (computed on the basis of the statistical information about the linear state component made available by BIF) and the pdf obtained for under the assumption that this vector is expressed by Eq. (25). The message is evaluated for each of the particles representing in BIF; this results in a set of particle weights employed in the first measurement update of BIF and different from those computed on the basis of (18) in its second measurement update.

A graphical model similar to the one shown in Fig. 3 can be easily derived from the general model appearing in Fig. 2 for case C.2 too. The relevant differences with respect to case C.1 can be summarized as follows:

1) The backward filters BIF and BIF estimate and , respectively; consequently, their nuisance substates are and , respectively.

2) The BIFBIF block is fed by the backward predicted/smoothed pdfs computed by BIF; such pdfs are employed for: a) generating the messages () and () required to integrate out the dependence of the Markov model (see Eq. (23))

and of the measurement model (20), respectively, on ; b) generating pseudo-measurement information about the substate only. As far as point a) is concerned, it is also important to point out that the model (