# Stable approximation schemes for optimal filters

We explore a general truncation scheme for the approximation of (possibly unstable) optimal filters. In particular, let = (π_0,κ_t,g_t) be a state space model defined by a prior distribution π_0, Markov kernels {κ_t}_t> 1 and potential functions {g_t}_t > 1, and let = {C_t}_t> 1 be a sequence of compact subsets of the state space. In the first part of the manuscript, we describe a systematic procedure to construct a system ^=(π_0,κ_t^,g_t^), where each potential g_t^ is truncated to have null value outside the set C_t, such that the optimal filters generated by S and S^ can be made arbitrarily close, with approximation errors independent of time t. Then, in a second part, we investigate the stability of the approximately-optimal filters. Specifically, given a system with a prescribed prior π_0, we seek sufficient conditions to guarantee that the truncated system ^ (with the same prior π_0) generates a sequence of optimal filters which are stable and, at the same time, can attain arbitrarily small approximation errors. Besides the design of approximate filters, the methods and results obtained in this paper can be applied to determine whether a prescribed system yields a sequence of stable filters and to investigate topological properties of classes of optimal filters. As an example of the latter, we explicitly construct a metric space (,D_q), where is a class of state space systems and D_q is a proper metric on , which contains a dense subset _0 ⊂ such that every element _0 ∈_0 is a state space system yielding a stable sequence of optimal filters.

There are no comments yet.

## Authors

• 9 publications
• 1 publication
• 10 publications
• ### Stability of trigonometric approximation in L^p and applications to prediction theory

Let Γ be an LCA group and (μ_n) be a sequence of bounded regular Borel m...
04/27/2021 ∙ by Lutz Klotz, et al. ∙ 0

• ### On the Transferability of Spectral Graph Filters

This paper focuses on spectral filters on graphs, namely filters defined...
01/29/2019 ∙ by Ron Levie, et al. ∙ 0

• ### Energy Stability of Explicit Runge-Kutta Methods for Non-autonomous or Nonlinear Problems

Many important initial value problems have the property that energy is n...
09/29/2019 ∙ by Hendrik Ranocha, et al. ∙ 0

• ### Applying Information Theory to Design Optimal Filters for Photometric Redshifts

In this paper we apply ideas from information theory to create a method ...
01/06/2020 ∙ by J. Bryce Kalmbach, et al. ∙ 0

• ### Hypotheses testing and posterior concentration rates for semi-Markov processes

In this paper, we adopt a nonparametric Bayesian approach and investigat...
06/13/2019 ∙ by V Barbu, et al. ∙ 0

• ### Optimal Decoding of Convolutional Codes using a Linear State Space Control Formulation

The equivalence of a systematic convolutional encoder as linear state-sp...
12/21/2020 ∙ by Caleb Bowyer, et al. ∙ 0

• ### Convolutional Filtering and Neural Networks with Non Commutative Algebras

In this paper we provide stability results for algebraic neural networks...
08/23/2021 ∙ by Alejandro Parada-Mayorga, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 State space models and optimal filters

In this manuscript we are concerned with stochastic dynamical systems that evolve in discrete time Such systems consists of two random sequences: a state or signal sequence and an observation or measurement sequence . The states cannot be observed directly and the so-called optimal filtering problem [1] essentially consists in computing, at every time instant

, the probability distribution of the state

conditional on the observations .

Often, the filtering problem is narrowed down to the class of systems for which the signal is Markovian and the observations are conditionally independent given the states. Such systems can be fully characterised by the probability distribution of the state at time , denoted , the Markov kernel that determines the probabilistic dynamics of the state sequence , denoted , and a potential function that relates the observation with the state

. The latter potential coincides (up to a proportionality constant) with the probability density function (pdf) of

conditional on . We refer to the system described by , and as a state space model.

For a fixed sequence of observations , the model yields a deterministic sequence of probability measures , where describes the probability distribution of conditional on the subsequence (and the model itself). If the sequence of observations is assumed random, then the system generates an associated sequence of random probability measures. Either deterministic or random, the sequence is the solution to the optimal filtering problem. Hence, the probability measure is often referred to as the optimal filter [1].

Optimal filtering algorithms are procedures for the recursive computation, either exact or approximate, of the sequence

. Well known examples include the Kalman filter

[20] and its many variants [1, 13, 19], the Beneš filter [4] or particle filters [15, 12, 6, 2]. They have found practical applications in a multitude of scientific and engineering problems, including navigation and tracking [16, 24], geophysics [22], biomedical engineering [23] and many others.

### 1.2 Stability of the optimal filter

For a given a priori distribution , the sequence of optimal filters depends on the Markov kernels and the potential functions of the state space model. Indeed, for a given model and a given sequence of observations, it is possible to describe a filtering operator that maps the prior to the optimal filter [7, 6]. Let us denote this operator as , in such a way that is the optimal filter at time when the initial distribution is and is the optimal filter when the a priori distribution is (with the Markov kernels , the potentials and the observations being the same in both cases). It is said that the optimal filter is stable when, for some properly defined metric function111Most often the total variation distance [7, 5]. ,

 limt→∞D(πt,~πt)=limt→∞D(Φt|0(π0),Φt|0(~π0))=0.

Let us note that stability is actually a property of the map , i.e., a property of the combination of the kernels with the potential functions and the observations . It would be therefore more accurate to refer to the stability of the filtering operator rather than the stability of the filter itself.

Stability is important both theoretically (as a fundamental property of the system dynamics) and for practical reasons: stable filters can, in principle, be approximated numerically with error rates that hold uniformly over time for a fixed computational effort [7, 18], while unstable filters demand that the computational complexity of the numerical approximation be increased over time in order to prevent the approximation error from growing. The reason is that stable filters forget their initial conditions and their numerical implementations inherit this property and also progressively forget past errors, preventing their accumulation.

The analysis of the stability of a filtering operator is not an easy task. Quoting [5]

“stability of the nonlinear filter stems from a delicate interplay of the signal ergodic properties and the observations ‘quality’. If one of these ingredients is removed, the other should be strengthened in order to keep the filter stable”. The authors of

[5] use martingale convergence results to prove almost sure stability for sequences of integrals , where is a test function of a particular class whose definition involves both the potentials and the kernels in the model [5]. Other authors resort to the analysis of the total variation distance between optimal filters obtained from different initial distributions [21, 11, 17] and relate stability to other properties of the dynamical system, often related to the ergodicity of the state process [21, 11] or its observability and controllability (see [17] for the analysis of the continuous-time optimal filter). A recent analysis that builds upon [21, 11] but employs a different metric (which enables the inspection of integrals for unbounded) can be found in [14].

The main issue with the methods in [5, 21, 11, 17, 14] is that stability is related to sets of conditions which are often hard to verify from the standard description of the filtering operator in terms of the kernels and the potentials . In contrast, the authors of [18] provide a set of relatively simple-to-verify conditions for the stability of ; however, their analysis reduces to a relatively narrow class of state space models (with additive noise and exponential-family pdf’s). A more general study can be found in [7, 6], where Dobrushin contraction coefficients [9, 10] are used as the key tool to obtain conditions on and which are sufficient for stability.

To our best knowledge, there as been no attempt to construct a topological characterisation of stable filters. Rather natural questions, such as whether stable filters are “many” or “few” for a given class of state space models have not been investigated to this day.

### 1.3 Contributions

We propose and investigate a general scheme for the approximation of (possibly unstable) optimal filters that involves the truncation of the potential functions and the “reshaping” of the Markov kernels , both related to a prescribed sequence of compact sets . In particular, let be the state space model described by the prior distribution and an operator . Recall that depends on , and the observations . We construct an approximation which:

• maintains the same prior measure ,

• truncates the potentials, to yield new functions which are null outside the subset , and

• reshapes the Markov kernels , in a manner depends on the consecutive subsets and .

The operator generated by the approximate model (and the same observations) is denoted and the approximate filters are . If the compact subsets are sufficiently large (in a manner to be made precise), then we prove that for any bounded real test function the approximation error can be bounded uniformly over time, i.e., , for arbitrarily small .

In the second part of the manuscript we adapt some results from [7, 6] in order to investigate the stability of the sequence of approximate filters . Specifically, we identify sufficient conditions to guarantee that the truncated operator (generated by the reshaped kernels and truncated potentials) generates a sequence of stable filters while keeping the approximation error bounded for the intended prior distribution . Let us remark that, if the original model yields unstable sequences of filters, then it is not possible to construct an approximate model that yields time-uniform small errors and stability. If it were possible, then the original model would yield stable filters itself. Our approach, therefore, is to construct a truncated model, with operator , such that:

• It generates sufficiently good approximations when the prior is the prescribed one, i.e., for any real and bounded , although these approximations may deteriorate if we change the prior to some .

• The truncated operator is stable, i.e., even when .

Besides the design of approximate filters, the methods and results obtained in this paper can be applied to determine whether a prescribed system yields a sequence of stable filters (or not) and to investigate topological properties of classes of optimal filters. As an example of the latter, we explicitly construct a metric space , where is a class of state space models and is a proper metric on , which contains a dense subset such that every element is a state space model yielding a stable sequence of optimal filters.

### 1.4 Organisation of the paper.

We complete this introduction, in Section 1.5, with a brief summary of the notation used through the manuscript. Then, Section 2

is devoted to a detailed statement of the optimal filtering problem for state space Markov models and a formal definition of the notion of stability for sequences of optimal filters. In Section

3 we introduce the proposed approximation method. The approximation scheme by itself does not directly guarantee stability. In Section 4 we introduce a probabilistic characterisation of the normalisation constants of the (random) optimal filters that can be used to ease the stability analysis of the filters (both optimal and approximate). Using these new results, in Section 5 we provide different sets of regularity conditions which are sufficient to guarantee the stability of the approximate filters. Section 6 is devoted to some brief concluding remarks.

### 1.5 Notation

We briefly summarise, for reference and roughly organised by topics, the notation used throughout the manuscript.

Sets, measures and integrals:

• denotes the set of real vector of dimension

.

• is the -algebra of Borel subsets of .

• is the set of probability measures over .

• is the integral of a function with respect to a measure .

• Given a set , the indicator function on is

 1S(x)={1,if x∈S,0,otherwise.

Given a measure and a measurable set we equivalently denote .

• Let be a subset of a reference space . The complement of with respect to is denoted .

Functions and sequences:

• is the set of bounded real functions over . Given a sequence , we denote

 ∥ft∥∞:=sups∈S|f(s)|and∥f∥∞:=supt≥1∥ft∥∞.
• We use a subscript notation for subsequences, namely .

• Random variables (r.v.’s) are denoted by capital letters (e.g., ) and their realisations using lower case letters (e.g., or, simply, ).

• denotes expectation w.r.t. a prescribed probability distribution.

• If is a r.v. taking values in , with associated probability measure , then the norm of is , .

## 2 State space models and optimal filters

### 2.1 Markov state-space models in discrete time

Let be a probability space, where is the sample space, is the associated Borel -algebra and is a probability measure. A -dimensional discrete random sequence on the space is a function , for some range , i.e., for each we obtain a sequence , where for every .

We specifically consider two random sequences,

• the signal or state process , taking values on the state space ,

• and the observation or measurement process , taking values on the observation space .

All probabilities related to the state and observation processes can be constructed from the measure . For notational conciseness, however, we introduce the probability measure defined as

 P(cB({Xt,Yt}t≥0)):=PΩ({ω∈Ω:cB({Xt(ω),Yt(ω)}t≥0) is true})

for any proper Boolean condition on the joint sequence . For example, if , for some integer and , then

 P(Xk∈A):=PΩ({ω∈Ω:Xk(ω)∈A}).

We assume that the state process evolves over time according to the family of Markov kernels

 κt(A|xt−1)=P(Xt∈A|Xt−1=xt−1),

where and . The probability distribution of is characterised by a normalised measure that we indistinctly denote as or .

The observation process is described by the conditional distribution of the observation given the state . Specifically, we assume that the random variable (r.v.) taking values in has a conditional probability density function (pdf) w.r.t. a reference measure (usually, but not necessarily, the Lebesgue measure), given the state . The observations are assumed to be conditionally independent given the states and independently of (i.e., does not bear any information on the state process or the rest of the observation process, , and we ignore it in the sequel).

If the sequence is fixed, then we write for conciseness and to emphasise that is a function of the state , i.e., we use as the likelihood of given the observation . When the observation sequence is random, we write for the likelihood of (note that is a r.v. itself).

The prior measure , the family of Markov kernels and the family of conditional pdf’s (or likelihoods) describe a Markov state space model.

### 2.2 The optimal filter

The filtering problem consists in the computation of the posterior probability measure of the state

given a sequence of observations up to time . Specifically, we aim at computing the sequence of probability measures

 πt(A):=P(Xt∈A|Y1:t=y1:t),t≥0,

where and . The measure is commonly called the optimal filter at time and we are typically interested in the computation of integrals of the form .

Usually, is computed from in two steps. First, we obtain the predictive probability measure

 ξt(A):=P(Xt∈A|Y1:t−1=y1:t−1)

and then we compute from . To be precise, we have , meaning that

 (f,ξt)=(f,κtπt−1)=((f,κt),πt−1),

and , meaning that

 (f,gt⋅ξt)=(fgt,ξt)(gt,ξt).

The definitions above are given for a fixed (but arbitrary, unless otherwise stated) sequence of observations . In this case, the state space model described by the triple yields deterministic sequences of filtering, , and predictive, , probability measures. If the observations are random, then the model yields sequences of random measures and , (note the superscript in the notation).

### 2.3 The prediction-update operator

The transformation of the filter into can be represented by the composition of two maps:

• The prediction (P) operator , where

• The update (U) operator , i.e.,

 (f,Υt(μ)):=(fgt,μ)(gt,μ).

By composing the maps and we obtain the prediction-update (PU) operator

 Φt(μ):=(Υt∘Ψt)(μ) (1)

such that

 (f,Φt(μ))=(f,Υt(Ψt(μ))=(fgt,κtμ)(gt,κtμ)=(f,gt⋅κtμ), (2)

which obviously implies . If we additionally define the composition of PU operators

 Φt|k:=Φt∘Φt−1∘⋯∘Φk+1

then we can compactly represent the evolution of the filter over consecutive steps as . Note that the map depends on the Markov kernel and the likelihood alone (and not on the prior measure ).

When the observations are random, the PU operator depends on the function-valued r.v. and is itself a random map (that we denote as ). We readily obtain and

### 2.4 Stability of the optimal filter

Given a sequence of Markov kernels and likelihood functions , we say that “the optimal filter is stable” when the sequences of measures

 π0=κ0⟶π1=g1⋅κ1π0⟶π2=g2⋅κ2π1⟶⋯⟶πt=gt⋅κtπt−1⟶⋯ ~π0=~κc0⟶~π1=g1⋅κ1~π0⟶~π2=g2⋅κ2~π1⟶⋯⟶~πt=gt⋅κt~πt−1⟶⋯

converge even if , i.e., when for any and any .

The expression “stability of the filter” may turn out misleading sometimes because stability is a property of the PU operator (i.e., a property of the pair ). Indeed, we have stability if, and only if,

 limt→∞|(f,Φt|0(α))−(f,Φt|0(β))|=0

for any pair of measures and any function . In the sequel, we often refer to the stability of the map instead of the stability of the filter .

When the observations are random, we say that the PU operator is stable -a.s. when there exists a set such that and, for every , the sequence of observations yields a stable PU operator .

## 3 Truncated filters

### 3.1 Truncation state space models

We are going to use truncated filters as building blocks. For a fixed but arbitrary sequence of observations , let be a state space model yielding the sequence of filters . We can construct a truncated version of the model (and, hence, a sequence of filters for the truncated model) by

• choosing a sequence of compact subsets of the state space, denoted , and

• defining the truncated likelihoods

 gct(x)=1Ct(x)gt(x), (3)

where is the indicator function, i.e., for and otherwise.

The truncated model is and it yields the sequence of filters and the sequence of predictive measures , with .

Usually, we construct a truncated model in order to approximate the sequence of filters produced by a prescribed model . For example, if we are given then we select a sequence of compact subsets and construct the truncated approximation so that

 πct=gct⋅~κctπct−1≈πt=gt⋅κtπt−1

for every . Note that, in order to make the approximation accurate, besides the truncation of , we may want to modify the Markov kernel, i.e., to choose some suitable .

### 3.2 Approximation error

Let be an unstable PU operator. It is not possible to obtain a stable map such that, for any ,

 |(f,Φt|0(π0))−(f,¯Φt|0(π0))|

for an arbitrary initial measure , an arbitrary test function , an arbitrarily small and a finite constant (otherwise the operator would be stable itself).

However, if we fix the prior measure , i.e., we specify a complete state space model it is possible to define a truncated state space model that generates truncated filters arbitrarily close to the original measures given the fixed prior .

Since is determined by the pair , let us denote the state space model with prior measure , Markov kernel and likelihood as . We aim at constructing a truncated state space model , yielding the sequence of filters and predictors , such that

 |(f,πt)−(f,πc)|=|(f,Φt|0(π0))−(f,Φc,ϵt(π0))|<∥f∥∞ϵ (4)

for every , even if is possibly unstable. It turns out that truncated state space models can be constructed in a systematic way. The key ingredient is the choice of a reshaped kernel that can be obtained from any given .

###### Definition 1

Let be a state space model and let be a sequence of compact subsets of . We define the reshaped Markov kernel as and

 ~κct(dx|x′):=κt(dx|x′)πt−1(Ct−1)+ρt(dx)

for , where

 ρt(dx):=∫1¯Ct−1(x′)κt(dx|x′)πt−1(dx′). (5)

The two lemmas stated (and proved) below guarantee that the inequality (4) can be satisfied by the filters generated by the state space model with reshaped kernels and truncated likelihoods provided the compact subsets have sufficiently large probability mass.

###### Lemma 1

Let be a state space model and let be a sequence of compact subsets of . The truncated state space model yields sequences of predictive and filtering probability measures ( and , respectively) such that, for every and every ,

 (1Ctf,ξt) = (1Ctf,ξct),and (6) (1Ctf,πt) = (f,πct)πt(Ct). (7)

Proof: We prove this result by an induction argument. For , the result is straightforward. The models and share the same prior and, from Definition 1, , hence

 ξc1=~κc1π0=κ1π0=ξ1 (8)

and Eq. (6) holds. As for the relationship between and , we have

 (1C1f,π1) = (1C1fg1,ξ1)(g1,ξ1) (9) = (1C1g1,ξ1)(g1,ξ1)×(1C1fg1,ξ1)(1C1g1,ξ1) (10) = π1(C1)×(fgc1,ξc1)(gc1,ξc1) (11) = π1(C1)(f,πc1), (12)

where Eq. (9

) follows from Bayes’ theorem, we obtain Eq. (

10) multiplying and dividing by , Eq. (11) follows from the relationship together with the identities (obtained in (8)) and (see Eq. (3)). The last equality results readily from and completes the proof for .

For the induction step, let us assume that

 (1Ct−1f,πt−1)=(f,πct−1)πt−1(Ct−1) (13)

for any . We evaluate the difference first. We recall that and , hence,

 (1Ctf,ξt)−(1Ctf,ξct) = (1Ctf,κtπt−1)−(1Ctf,~κctπct−1) (14) = ((1Ctf,κt),πt−1)−((1Ctf,~κct),πct−1)

Furthermore, if we note that, for any integrable function , and recall that , then we obtain

 ((1Ctf,κt),πt−1)−((1Ctf,~κct),πct−1) = ((1Ctf,κt)1Ct−1,πt−1)+((1Ctf,κt)1¯Ct−1,πt−1) (15) −πt−1(Ct−1)((1Ctf,κt),πct−1)−((1Ctf,ρt),πct−1).

However, we note that

 ((1Ctf,ρt),πct−1)=(1Ctf,ρt)(1X,πct−1)=(1Ctf,ρt) (16)

and substituting (16) into (15) yields

 ((1Ctf,κt),πt−1)−((1Ctf,~κct),πct−1) = ((1Ctf,κt)1Ct−1,πt−1)+((1Ctf,κt)1¯Ct−1,πt−1) (17) −πt−1(Ct−1)((1Ctf,κt),πct−1)−(1Ctf,ρt).

Taking (14) and (17) together, we have the identity

 (1Ctf,ξt)−(1Ctf,ξct) = ((1Ctf,κt)1Ct−1,πt−1)+((1Ctf,κt)1¯Ct−1,πt−1) (18) −πt−1(Ct−1)((1Ctf,κt),πct−1)−(1Ctf,ρt).

Let us now compare the first and third terms in Eq. (18). If we define the function

 ft(x):=∫1Ct(x′)f(x′)κt(dx′|x) (19)

then it is straightforward to see that the first term on the r.h.s. of (18) can be rewritten as

 ((1Ctf,κt)1Ct−1,πt−1)=(1Ct−1ft,πt−1), (20)

while, for the third term,

 πt−1(Ct−1)((1Ctf,κt),πct−1)=πt−1(Ct−1)(ft,πct−1). (21)

However, using the induction hypothesis (13),

 πt−1(Ct−1)(ft,πct−1)=(1Ct−1ft,πt−1), (22)

hence putting Eqs. (20)–(22) together yields

 ((1Ctf,κt)1Ct−1,πt−1)−πt−1(Ct−1)((1Ctf,κt),πct−1)=0. (23)

We are now left with the comparison of the second and fourth terms in (15). For the second term, it is straightforward to see that

 ((1Ctf,κt)1¯Ct−1,πt−1)=(1¯Ct−1ft,πt−1). (24)

The calculation for the fourth term is also straightforward. From the definition of in Eq. (5),

 (1Ctf,ρt) = ∫1Ct(x)f(x)∫1¯Ct−1(x′)κt(dx|x′)πt−1(dx′) (25) = ∫1¯Ct−1(x′)[∫1Ct(x)f(x)κt(dx|x′)]πt−1(dx′) = (1¯Ct−1ft,πt−1),

where the last equality is obtained from the definition of in Eq. (19). If we combine Eqs. (24) and (25) we arrive at

 ((1Ctf,κt)1¯C