## I. Introduction

The problem of *Bayesian* *smoothing* for a *state space model*

(SSM) concerns the development of recursive algorithms able to estimate the

*probability density function*(pdf) of the model state on a given observation interval, given a batch of noisy measurements acquired over it [1], [2]; the estimated pdf is known as a

*smoothed*or

*smoothing*pdf. Two general methods are available in the literature for recursively calculating smoothing densities; they are known as the

*forward filtering-backward smoothing recursion*(e.g., see [3] and [4]) and the method based on the

*two-filter smoothing formula*(e.g., see [5] and [6]). Both methods are based on the idea that the smoothing densities can be computed by combining the predicted and/or filtered densities generated by a Bayesian filtering method with the statistical information produced in the backward pass by a different filtering method; the latter method is

*paired with the first one*and, in the case of the two-filter smoothing formula, is known as

*backward information filtering*(BIF). Unluckily, closed form solutions for Bayesian smoothing are available for

*linear Gaussian*and

*linear Gaussian mixture*models only [1, 2, 7]

. This has motivated the development of various methods based on approximating smoothing densities in different ways. For instance, the use of Gaussian approximations for the smoothing densities and of sigma points techniques for solving moment matching integrals has been investigated in

[8, 9, 10]. Another class of methods (usually known as*particle smoothers*) is based on the exploitation of

*sequential Monte Carlo*techniques, i.e. on approximating smoothing densities through a set of weighted particles (e.g., see [3, 11, 5, 12, 13, 14] and references therein). Recently, substantial attention has been also paid to the development of smoothing algorithms for the class of

*conditionally linear Gaussian*SSMs [15, 16, 17, 18, 19]. In this case, the above mentioned approximate methods can benefit from the so called

*Rao-Blackwellization*technique, i.e. from the marginalisation of the linear substructure of any conditionally linear Gaussian model; this can significantly reduce the overall computational complexity of both sigma-point based Gaussian smoothing [15] and particle smoothing [16, 17, 18, 19] (that is usually known as

*Rao-Blackwellized particle smoothing*, RBPS, in this case).

In this manuscript, we propose a novel general method for the development of
*computationally efficient* particle smoothers. Our method exploits the
same conceptual approach illustrated in [20] in the
context of Bayesian filtering and dubbed *multiple Bayesian filtering*.
That approach is based on the idea of developing new filtering algorithms
by: a) interconnecting multiple heterogeneous Bayesian filters; b)
representing the processing accomplished by each Bayesian filter and the
exchange of statistical information among distinct filters as a message
passing over a proper factor graph. In [20] the
exploitation of this approach has been investigated in detail for the case
in which two Bayesian filters are interconnected, i.e. *dual Bayesian
filtering* (DBF) is employed. Moreover, it has been shown that accurate and
computationally efficient DBF algorithms can be devised if the considered
SSM is conditionally linear Gaussian. In this manuscript, we show that, if
DBF is employed in the *forward* pass of a smoothing method, a BIF
method, paired with DBF and based on the interconnection of two backward
information filters can be devised by following some simple rules. Similarly
as DBF, our derivation of such a BIF method, called *double backward
information filtering* (DBIF), is based on a *graphical model*. Such a
graphical model allows us to show that: a) the pdfs computed in DBIF can be
represented as messages passed on it; b) all the expressions of the passed
messages can be derived by applying the same rule, namely the so called
*sum-product algorithm* (SPA) [21], [22], to it; c) iterative algorithms can be developed in a
natural fashion once the cycles it contains have been identified and the
order according to which messages are passed on them (i.e., the *message scheduling*) has been established; d) the statistical information
generated by a DBIF algorithm in the backward pass can be easily merged with
those produced by its paired DBF technique in the forward pass in order to
evaluate the required smoothed pdfs. To exemplify the usefulness of the
resulting smoothing method, based on the combination of DBF and DBIF, and
called *double Bayesian smoothing* (DBS), the two DBF algorithms
proposed in [20] for the class of conditionally
linear Gaussian SSMs are taken into consideration, and the BITF algorithm
paired with each of them and a simplified version of it are derived. This
leads to the development of four new DBS algorithms, two generating an
estimate of the joint smoothing density over the whole observation interval,
the other two an estimate of the marginal smoothing densities over the same
interval. Our computer simulations for two specific conditionally linear
Gaussian SSMs evidence that, in the first case, the derived DBS algorithms
perform very closely to the RBPS technique proposed in [18]
and to the particle smoothers devised in [19], but at lower
computational cost and time. In the second case, instead, two of the devised
DBS techniques represent the only technically useful options, thanks to
their good tracking capability. In fact, such techniques are able to operate
reliably even when their competitors diverge in the forward pass.

It is worth stressing that the technical contribution provided by this
manuscript represents a significant advancement with respect to the
application of factor graph theory to particle smoothing illustrated in [19]. In fact, in that manuscript, we also focus on conditionally
linear Gaussian models, but assume that the forward pass is accomplished by
*marginalized particle filtering* (MPF; also known as *Rao-Blackwellized particle filtering*); in other words, Bayesian filtering
is based on the interconnection of a particle filter with a *bank*
*of Kalman filters*. In this manuscript, instead, the general method we
propose applies to a couple of arbitrary interconnected Bayesian filters.
Moreover, the specific smoothing algorithms we derive assume that the
forward pass is carried out by a filtering algorithm based on the
interconnection of a particle filter with a *single* extended Kalman
filter.

The remaining part of this manuscript is organized as follows. In Section II., a general graphical model, on which the processing accomplished in DBF, DBIF, and DBS is based, is illustrated. In Section III., a specific instance of the graphical model illustrated in the previous section is developed under the assumptions that the filters employed in the forward pass are an extended Kalman filter and a particle filter, and that the considered SSM is conditionally linear Gaussian. Then, the scheduling and the computation of the messages passed over this model are analysed in detail and new DBS algorithms are devised. The differences and similarities between these algorithms and other known smoothing techniques are analysed in Section IV.. A comparison, in terms of accuracy, computational complexity, and execution time, between the proposed techniques and three smoothers recently appeared in the literature, is provided in Section V. for two conditionally linear Gaussian SSMs. Finally, some conclusions are offered in Section VI..

## Ii. Graphical Model for a Couple of Interconnected Bayesian Information Filters and Message Passing on it

In this manuscript, we consider a discrete-time SSM whose dimensional
*hidden state* in the th interval is denoted , and whose
*state update* and *measurement models* are expressed by

(1) |

and

(2) | |||||

respectively, with , , , . Here, () is a
time-varying dimensional (dimensional) real function, is the
duration of the observation interval and ()
is the th element of the process (measurement) noise sequence (); this sequence
consists of dimensional (dimensional) *independent and
identically distributed*

(iid) Gaussian noise vectors, each characterized by a zero mean and a covariance matrix

(). Moreover, statistical independence between and is assumed.From a statistical viewpoint, a complete statistical description of the
considered SSM is provided by the pdf of its initial
state, its *Markov model* and its
*observation model* for any ; the
first pdf is assumed to be known, whereas the last two pdfs can be easily
derived from Eq. (1) and Eq. (2), respectively.

In the following, we focus on the problem of developing novel smoothing
algorithms and, in particular, algorithms for the estimation of the *joint* *smoothed pdf* (problem
P.1) and the sequence of *marginal* *smoothed pdfs * (problem P.2);
here, is a dimensional
vector. Note that, in principle, once problem P.1 is solved,
problem P.2 can be easily tackled; in fact, if the joint pdf is known, all the posterior pdfs can be evaluated by *marginalization*.

The development of our smoothing algorithms is mainly based on the graphical approach illustrated in our previous manuscripts [19, Sec. III], [20, Sec. II] and [23, Sec. III] for Bayesian filtering and smoothing. This approach consists in the following steps:

1. The state vector is partitioned in two substates, denoted and and having sizes and , respectively. Note that, if represents the portion of not included in (with and ), our assumptions entail that and .

2. A *sub-graph* that allows to represent both Bayesian filtering and
BIF for the substate (with and ) as message
passing algorithms on it is developed, under the assumption that the
complementary substate is statistically known.
This means that filtered and predicted densities of
are represented as messages passed on the edges of this sub-graph and the
rules for computing them result from the application of the SPA to it.

3. The two sub-graphs devised in the previous step (one referring to , the other one to ) are *interconnected*, so that a single graphical model referring to the whole
state is obtained.

4. Algorithms for Bayesian filtering and BIF for the whole state are derived by applying the SPA to the graphical model obtained in the previous step.

Let us analyse now the steps 2.-4. in more detail. As far as step 2.
is concerned, the sub-graph devised for the substate
is based on the same principles illustrated in our manuscripts cited above
(in particular, ref. [20]) and is illustrated in
Fig. 1. The th recursion (with ) of Bayesian
filtering for the sub-state is represented as a *forward* message passing on this factor graph, that involves the Markov
model and the observation model . This allows to compute the messages
, and ,
that convey the *first* *filtered* pdf of ,
the *second* *filtered* pdf of and the
*predicted* pdf of , respectively, on the basis
of the messages , and ; the
last three messages represent the predicted pdf of
evaluated in the previous (i.e., in the th) recursion of Bayesian
filtering, and the messages conveying the *measurement* and the *pseudo-measurement* information, respectively, available in the recursion. The considered filtering algorithm requires the availability of
the messages , , , that are computed on the basis of external statistical information. The
presence of the messages and is due the fact that the
substate represents a *nuisance state* for
the considered filtering algorithm; in fact, these messages convey filtered
(or predicted) pdfs of and are employed to
integrate out the dependence of the pdfs and , respectively, on . Note also that these two messages are not necessarily
equal, since more refined information about
could become available after that the message has been computed. On the other hand, the message conveys the statistical information provided by a
pseudo-measurement^{1}^{1}1Generally speaking, a pseudo-measurement is a *fictitious* measurement
that is computed on the basis of statistical information provided by a
filtering algorithm different from the one benefiting from it. about . In Fig. 1, following [20, Sec. II], it is assumed that the pseudo-measurement is available in the estimation of and
that represents the pdf of conditioned on , that is

(3) |

The computation of the messages , and on the basis of the messages , , , and is based on the two simple rules illustrated in [23, Figs. 8-a) and 8-b), p. 1535] and can be summarized as follows. The first and second filtered pdfs (i.e., the first and the second forward estimates) of are evaluated as

(4) |

and

(5) |

respectively, where

(6) |

and is defined in Eq. (3). Equations (4)-(6) describe the processing
accomplished in the *measurement update* of the considered recursion.
This is followed by the *time update*, in which the new predicted pdf
(i.e., the new forward prediction)

(7) | |||||

is computed. The message passing procedure described above is initialised by
setting
(where is the pdf resulting from the
marginalization of with respect to ) in the first recursion and is run for . Once this
procedure is over, BIF is executed for the substate ;
its th recursion (with ) can be represented as a
*backward* message passing on the factor graph shown in Fig. 1. In this case, the messages , , , that
convey the *backward predicted* pdf of , the
*first backward* *filtered* pdf of and the
*second backward* *filtered* pdf of ,
respectively, are evaluated on the basis of the messages , and , respectively; note that represents the backward filtered pdf
of computed in the previous (i.e., in the th) recursion of BIF. Moreover, the first and second backward filtered pdfs
of are evaluated as (see Fig. 1)

(8) |

and

(9) |

respectively, where and are still expressed by Eq. (3) and
Eq. (6), respectively. The BIF message passing is initialised
by setting in its first recursion and is run for . Once the backward pass is over, a solution to problem
P.2 becomes available for the substate ,
since the marginal smoothed pdf (where is the dimensional vector resulting from the ordered concatenation of the all the
observed pseudo-measurements ) can be evaluated as^{2}^{2}2Note that, similarly as refs. [19] and [23], a
*joint* smoothed pdf is considered here in place of the corresponding
*posterior* pdf.

(10) | |||||

(11) | |||||

(12) |

with . Note that, from a graphical viewpoint, formulas (10)-(12) can be related with the three different partitionings of the graph shown in Fig. 1 (where a specific partitioning is identified by a brown dashed vertical line cutting the graph in two parts).

Given the graphical model represented in Fig. 1, step 3.
can be accomplished by adopting the same conceptual approach as [19, Sec.
III] and [20, Par. II-B], where the
factor graphs on which smoothing and filtering, respectively, are based are
obtained by merging two sub-graphs, each referring to a distinct substate.
For this reason, in this case, the graphical model for the whole state is obtained by interconnecting two distinct factor graphs,
each structured like the one shown in Fig. 1. In [20, Par. II-B], message passing on the resulting graph is described
in detail for the case of *Bayesian filtering*. In this manuscript,
instead, our analysis of message passing concerns BIF* and smoothing
only*. The devised graph and the messages passed on it are shown in Fig. 2. Note that, in developing our graphical model, it has been assumed
that the *smoothed* pdf referring to (and
conveyed by the message ) is computed
on the basis of Eq. (10), i.e. by merging the messages and .
Moreover, the following elements (identified by brown lines) have been added
to its th sub-graph (with and ): a) two equality nodes; b) the
block BIFBIF for extracting useful information
from the messages computed on the th sub-graph and delivered to the th one. The former elements allow the th backward information filter to
generate copies of the messages and , that are made
available to the other sub-graphs. In the latter element, instead, the
messages (see Eq. (3))
and (with and ; see
Eqs. (6) and (7)) are computed; note that this
block is connected to *oriented* edges only, i.e. to edges on which the
flow of messages is unidirectional.

Given the graphical model represented in Fig. 2, step 4.
can be easily accomplished. In fact, recursive BIF and smoothing algorithms
can be derived by systematically applying the SPA to it after that a proper
*scheduling* has been established for message passing. In doing so, we
must always keep in mind that:

1) Message passing on the th subgraph represents BIF/smoothing for the
substate ; the exchange of messages between the
sub-graphs, instead, allows us to represent the interaction of two
interconnected BIF/smoothing algorithms in a effective and rigorous way.

2) Different approximations can be used for the predicted/filtered/smoothed
pdfs computed in the message passing on each of the two sub-graphs and for
the involved Markov/observation models. For this reason, generally speaking,
the two interconnected filtering/BIF/smoothing algorithms are not required
to be of the same type.

3) The th recursion of the overall BIF algorithm is fed by the backward
estimates ()
and (), and
generates the new backward predictions () and (), and the two couples of filtered densities , , , (, , , ). Moreover, merging the predicted
densities computed in the forward pass (i.e., the messages ) with
the second backward filtered densities (i.e., the messages )
allows us to generate the smoothed pdfs for each substate according to Eq. (10). However, a *joint* filtered/smoothed density
for the whole state is *unavailable*.

4) Specific algorithms are employed to compute the pseudo-measurement and
the nuisance substate pdfs in the BIFBIF blocks
appearing in Fig. 2. These algorithms depend on the considered SSM
and on the selected message scheduling; for this reason, a general
description of their structure cannot be provided.

5) The graphical model shown in Fig. 2, unlike the one illustrated
in Fig. 1, is *not cycle free*. The presence of cycles raises
the problems of identifying all the messages that can be iteratively refined
and establishing the order according to which they are computed. Generally
speaking, iterative message passing on the devised graphical model involves
both the couple of measurement updates and the backward prediction
accomplished in each of the interconnected backward information filters. In
fact, this should allow each filter to progressively refine the nuisance
substate density employed in its second measurement update and backward
prediction, and improve the quality of the pseudo-measurements exploited in
its first measurement update. For this reason, if iterations are
run, the overall computational complexity of each recursion is multiplied by
.

The final important issue about the graphical model devised for both
Bayesian filtering and BIF concerns the possible presence of *redundancy*. In all the considerations illustrated above, *disjoint* substates and have been assumed.
Actually, in ref. [20], it has been shown that our
graphical approach can be also employed if the substates and cover , but do not
necessarily form a partition of it. In other words, some overlapping between
these two substates is admitted. When this occurs, the forward/backward
filtering algorithm run over the whole graphical model contains a form of
*redundancy*, since elements of the
state vector are independently estimated by the
interconnected forward/backward filters. The parameter can be
considered as the *degree of redundancy* characterizing the
filtering/smoothing algorithm. Moreover, in ref. [20], it has been shown that the presence of redundancy in a Bayesian filtering
algorithm can significantly enhance its tracking capability (i.e., reduce
its probability of divergence); however, this result is obtained at the
price of an increased complexity with respect to the case in which the
interconnected filters are run over disjoint substates.

## Iii. Double Backward Information Filtering and Smoothing Algorithms for Conditionally Linear Gaussian State Space Models

In this section we focus on the development of two new DBS algorithms for conditionally linear Gaussian models. We first describe the graphical models on which these algorithms are based; then, we provide a detailed description of the computed messages and their scheduling in a specific case.

### A. Graphical Modelling

In this paragraph, we focus on a specific instance of the graphical model
illustrated in Fig. 2, since we make the same specific choices as
ref. [20] for both the considered SSM and the two
Bayesian filters employed in the forward pass. For this reason, we assume
that: a) the SSM described by eqs. (1)-(2) is
conditionally linear Gaussian [18], [23],
[24], so that its state vector can be
partitioned into its *linear component* and its
*nonlinear component* (having sizes and , respectively, with ); b) the dual Bayesian filter
employed in the forward pass results from the interconnection of an* extended Kalman filter* with a *particle filter*^{3}^{3}3In particular, a *sequential importance resampling* filter is employed
[25]. (these filters are denoted F and F,
respectively), as described in detail in ref. [20].
As far as the last point is concerned, it is also worth mentioning that, on
the one hand, filter F estimates the nonlinear state component only
(so that and ) and approximates the predicted/filtered
densities of this component through a set of weighted particles. On
the other hand, filter F employs a Gaussian approximation of all its
predicted/filtered densities, and works on the *whole system state* or
on the *linear state component*. In the first case (denoted C.1
in the following), we have that and is empty, so that both F and F
estimate the nonlinear state component (for this reason, the corresponding
degree of redundancy in the developed smoothing algorithm is );
in the second case (denoted C.2 in the following), instead, and , so that filters F and F estimate *disjoint* substates (consequently, ).

Our selection of the forward filtering scheme has the following implications
on the developed DBIF scheme. The first backward information filter (denoted
BIF) is the backward filter associated with an extended Kalman filter
operating over on the *whole system state* (case C.1) or on
the *linear state component* (case C.2). The second backward
filter (denoted BIF), instead, is a backward filter associated with a
particle filter operating on the nonlinear state component only. In
practice, following [18, 19, 17], BIF
is employed to update the weights of all the elements of the particle set
generated by filter F in the forward pass. Then, based on the
graphical model shown in Fig. 2, the factor graph illustrated in
Fig. 3 can be drawn for case C.1. It is important to
point out that:

1) The first backward information filter (BIF) is based on *linearised* (and, consequently, *approximate*) Markov/measurement
models, whereas the second one (BIF) relies on *exact* models, as
explained in more detail below. These models are the same as those employed
in ref. [20].

2) Since the nuisance substate is empty, no marginalization is required in BIF; for this reason, the messages ; (i.e., and ) visible in Fig. 2 do not appear in Fig. 3. Moreover, the message is generated on the basis of Eq. (11), instead of Eq. (10).

3) The backward filtered pdf and the smoothed pdf (i.e., the messages and , respectively) feed the BIFBIF block, where they are processed jointly to generate the pseudo-measurement message () feeding filter F. Similarly, the backward filtered pdf () and the smoothed pdf () feed the BIFBIF block, where the pseudo-measurement message () and the messages ; (i.e., and ) are evaluated.

In the remaining part of this paragraph, we first provide various details about the backward filters BIF and BIF, and the way pseudo-measurements are computed for each of them; then, we comment on how the factor graph shown in Fig. 3 should be modified if case C.2 is considered.

BIF - This backward filter is based on the *linearized* versions
of Eqs. (1) and (2), i.e. on the models (e.g.,
see [1, pp. 194-195] and [20, Par. III-A])

(13) |

and

(14) |

respectively; here, , , , and () is the *forward prediction *(*forward estimate*) of computed by F in its th
(th) recursion. Consequently, the approximate models

(15) |

and

(16) |

appear in the graphical model shown in Fig. 3.

BIF - In developing this backward filter, the state vector is represented as the ordered concatenation of its linear component , and its nonlinear component . Based on [23, eq. (3)], the Markov model

(17) |

is adopted for the nonlinear state component (this model corresponds to the last lines of Eq. (1)); here, () is a time-varying dimensional real function ( real matrix) and consists of the last elements of the noise term appearing in Eq. (1) (the covariance matrix of is denoted ). Moreover, it is assumed that the observation model (2) can be put in the form (see [20, eq. (31)] or [23, eq. (4)])

(18) |

where () is a time-varying dimensional real function ( real matrix). Consequently, the considered backward filter is based
on the *exact* pdfs

and

(20) | |||||

both appearing in the graphical model drawn in Fig. 3.

*Computation of the pseudo-measurements for the first backward filter*
- Filter BIF is fed by pseudo-measurement information about the *whole state* . The method for computing these information is
similar to the one illustrated in ref. [19, Sects. III-IV]
and can be summarised as follows. The pseudo-measurements about the
nonlinear state component are represented by the particles conveyed
by the smoothed pdf (). On the
other hand, pseudo-measurements about the linear state component are
evaluated by means of the same method employed by *marginalized
particle filtering* (MPF) for this task. This method is based on the idea
that the random vector (see Eq. (17))

(21) |

depending on the *nonlinear state component* *only*, must equal
the sum

(22) |

that depends on the *linear state component*. For this reason,
realizations of (21) are computed in the
BIFBIF block on the basis of the messages () and , and are treated as measurements about .

*Computation of the pseudo-measurements for the second backward filter*
- The messages ()
and () feeding the BIFBIF block are employed for: a) generating the messages ; required to integrate
out the dependence of the state update and measurement models (i.e., of the
densities , (LABEL:f_x_N) and (20), respectively) on the substate ; b) generating pseudo-measurement information about . As far as the last point is concerned, the approach we adopt is
the same as that developed for *dual marginalized particle filtering *(dual MPF) in ref. [23, Sec. V] and also adopted in particle
smoothing [19, Sects. III-IV]. The approach relies on the
Markov model

(23) |

referring to the *linear* state component (see [19, eq. (1)] or [23, eq. (3)]); in the last expression, () is a time-varying dimensional real function ( real matrix), and consists of the
first elements of the noise term appearing in (1) (the covariance matrix of is denoted , and independence between
and is assumed for simplicity). From Eq. (23) it is easily inferred that the random vector

(24) |

must equal the sum

(25) |

that depends on *only*; for this reason, (24) can be interpreted as a
pseudo-measurement about . In this case, the
pseudo-measurement information is conveyed by the message () that expresses the* correlation*
between the pdf of the random vector (24)
(computed on the basis of the statistical information about the linear state
component made available by BIF) and the pdf obtained for under the assumption that this vector is expressed by Eq. (25). The message is
evaluated for each of the particles representing in
BIF; this results in a set of particle weights employed in the
first measurement update of BIF and different from those computed on
the basis of (18) in its second measurement
update.

A graphical model similar to the one shown in Fig. 3 can be easily derived from the general model appearing in Fig. 2 for case C.2 too. The relevant differences with respect to case C.1 can be summarized as follows:

1) The backward filters BIF and BIF estimate and , respectively; consequently, their nuisance substates are and , respectively.

2) The BIFBIF block is fed by the backward predicted/smoothed pdfs computed by BIF; such pdfs are employed for: a) generating the messages () and () required to integrate out the dependence of the Markov model (see Eq. (23))

and of the measurement model (20), respectively, on ; b)
generating pseudo-measurement information about the substate *only*. As far as point a) is concerned, it is also
important to point out that the model (

Comments

There are no comments yet.