1 Introduction
It is well known that except for a few special cases [Segall_Point1976, Marcus1979, Elliott1994Exact, Elliott1994_HowToCount, Daum2005]
, general nonlinear filters of partially observable Markov processes (or Hidden Markov Models (HMMs)) do not admit finite dimensional (recursive) representations
[Segall1976, Elliott1994Hidden]. Nonlinear filtering problems, though, arise naturally in a wide variety of important applications, including target tracking [Tracking_1_2002, Tracking_2_2004], localization and robotics [Roumeliotis_2000, Volkov_2015], mathematical finance [OXFORD_Crisan_2011] and channel prediction in wireless sensor networks [KalPetChannelMarkov2014], just to name a few. Adopting the Minimum Mean Square Error (MMSE) as the standard optimality criterion, in most cases, the nonlinear filtering problem results in a dynamical system in the infinite dimensional space of measures, making the need for robust approximate solutions imperative.Approximate nonlinear filtering methods can be primarily categorized into two major groups [Chen_FILTERS_2003]: local and global
. Local methods include the celebrated extended Kalman filter
[Elliott2010620], the unscented Kalman filter [UNSCENTED], Gaussian approximations [ItoXiong1997], cubature Kalman filters [Cubature_2009] and quadrature Kalman filters [Elliott_GAUSS]. These methods are mainly based on the local “assumed form of the conditional density” approach, which dates back to the 1960’s [Kushner1967_Approximations]. Local methods are characterized by relatively small computational complexity, making them applicable in relatively higher dimensional systems. However, they are strictly suboptimal and, thus, they at most constitute efficient heuristics, but
without explicit theoretical guarantees. On the other hand, global methods, which include grid based approaches (relying on proper quantizations of the state space of the state process [Pages2005optimal, Kushner2001_BOOK, Kushner2008]) and Monte Carlo approaches (particle filters and related methods [PARTICLE2002tutorial]), provide approximations to the whole posterior measure of the state. Global methods possess very powerful asymptotic optimality properties, providing explicit theoretical guarantees and predictable performance. For that reason, they are very important both in theory and practice, either as solutions, or as benchmarks for the evaluation of suboptimal techniques. The main common disadvantage of global methods is their high computational complexity as the dimensionality of the underlying model increases. This is true both for grid based and particle filtering techniques [Bengtsson2008Curse, Quang2010Insight, RebeschiniRamon2013_1, Rebeschini2014Nonlinear, Sellami_2008_Compare].In this paper, we focus on grid based approximate filtering of Markov processes observed in conditionally Gaussian noise, constructed by exploiting uniform quantizations of the state. Two types of state quantizations are considered: the Markovian and the marginal ones (see [Pages2005optimal] and/or Section 3). Based on existing results [Elliott1994Hidden, Chen_FILTERS_2003, Pages2005optimal], one can derive grid based, recursive nonlinear filtering schemes, exploitting the properties of the aforementioned types of state approximations. The novelty of our work lies in the development of an original convergence analysis of those schemes, under generic assumptions on the expansiveness of the observations (see Section 2). Our contributions can be summarized as follows:
1) For marginal state quantizations, we propose the notion of conditional regularity of Markov kernels (Definition 2), which is an easily verifiable condition for guaranteeing strong asymptotic consistency of the resulting grid based filter. Conditional regularity is a simple and relaxed condition, in contrast to more complicated and potentially stronger conditions found in the literature, such as the Lipschitz assumption imposed on the stochastic kernel(s) of the underlying process in [Pages2005optimal].
2) Under certain conditions, we show that all grid based filters considered here converge to the true optimal nonlinear filter in a strong and controllable sense (Theorems 3 and 4). In particular, the convergence is compact in time and uniform in a measurable set occurring with probability almost ; this event is completely characterized in terms of the filtering horizon and the dimensionality of the observations.
3) We show that all our results can be easily extended in order to support filters of functionals of the state and recursive, grid based approximate prediction (Theorem 5). More specifically, we show that grid based filters are asymptotically optimal as long as the state functional is bounded and continuous; this is a typical assumption (see also [Elliott1994Hidden, Kushner2008, Crisan2002Survey]). Of course, this latter assumption is in addition to and independent from any other condition (e.g., conditional regularity) imposed on the structure of the partially observable system under consideration. In a companion paper [KalPetChannelMarkov2014], this simple property has been proven particularly useful, in the context of channel estimation in wireless sensor networks. The assumption of a bounded and continuous state functional is more relaxed as compared to the respective bounded and Lipschitz assumption found in [Pages2005optimal].
Another novel aspect of our contribution is that our original theoretical development is based more on linearalgebraic arguments and less on measure theoretic ones, making the presentation shorter, clearer and easy to follow.
Relation to the Literature
In this paper, conditional regularity is presented as a relaxed sufficient condition for asymptotic consistency of discrete time grid based filters, employing marginal state quantizations. Another set of conditions ensuring asymptotic convergence of state approximations to optimal nonlinear filters are the Kushner’s local consistency conditions (see, example, [Kushner2008, Kushner2001_BOOK]). These refer to Markov chain approximations for continuous time Gaussian diffusion processes and the related standard nonlinear filtering problem.
It is important to stress that, as it can be verified in Section IV, the constraints which conditional regularity imposes on the stochastic kernel of the hidden Markov process under consideration are general and do not require the assumption of any specific class of hidden models. In this sense, conditional regularity is a nonparametric condition for ensuring convergence to the optimal nonlinear filter. For example, hidden Markov processes driven by strictly nonGaussian noise are equally supported as their Gaussian counterparts, provided the same conditions are satisfied, as suggested by conditional regularity (see Section IV). Consequently, it is clear that conditional regularity advocated in this paper is different in nature than Kushner’s local consistency conditions [Kushner2008, Kushner2001_BOOK]. In fact, putting the differences between continuous and discrete time aside, conditional regularity is more general as well.
Convergence of discrete time approximate nonlinear filters (not necessarily recursive) is studied in [KalPetNonlinear_2015]. No special properties of the state are assumed, such as the Markov property; it is only assumed that the state is almost surely compactly supported. In this work, the results of [KalPetNonlinear_2015] provide the tools for showing asymptotic optimality of grid base, recursive approximate estimators. Further, our results have been leveraged in [KalPetChannelMarkov2014, KalPet_SPAWC_2015]
, showing asymptotic consistency of sequential spatiotemporal estimators/predictors of the magnitude of the wireless channel over a geographical region, as well as its variance. The estimation is based on limited channel observations, obtained by a small number of sensors.
The paper is organized as follows. In Section II, we define the system model under consideration and formulate the respective filtering approximation problem. In Section III, we present useful results on the asymptotic characterization of the Markovian and marginal quantizations of the state (Lemmata 2 and 4). Exploiting these results, Section IV is devoted to: (a) Showing convergence of the respective (not necessarily finite dimensional) grid based filters (Theorem 2). (b) Derivation of the respective recursive, asymptotically optimal filtering schemes, based on the Markov property and any other conditions imposed on the state (Theorem 3 and Lemmata 5 and 6, leading to Theorem 4). Extensions to the main results are also presented (Theorem 5), and recursive filter performance evaluation is also discussed (Theorem 6). Some analytical examples supporting our investigation are discussed in Section V, along with some numerical simulations. Finally, Section VI concludes the paper.
Notation
: In the following, the state vector will be represented as
, its innovative part as (if exists), its approximations as, and all other matrices and vectors, either random or not, will be denoted by boldface letters (to be clear by the context). Real valued random variables will be denoted by uppercase letters. Calligraphic letters and formal script letters will denote sets and
algebras, respectively. For any random variable (same for vector) , will denote the algebra generated by . The essential supremum (with respect to some measure  to be clear by the context) of a function over a set will be denoted by . The operators , andwill denote transposition, minimum and maximum eigenvalue, respectively. The
norm of a vector is , for all naturals . For any Euclidean space , will denote the respective identity operator. For collections of sets and , the usual Cartesian product is overloaded by defining . Additionally, we employ the identifications , , , and , for any positive natural .2 System Model & Problem Formulation
2.1 System Model & Technical Assumptions
All stochastic processes defined below are defined on a common complete probability space (the base space), defined by a triplet . Also, for a set , denotes the respective Borel algebra.
Let be Markov with known dynamics (stochastic kernel)^{1}^{1}1Hereafter, we employ the usual notation , for Borel.
(1) 
which, together with an initial probability measure on , completely describe its stochastic behavior. Generically, the state is assumed to be compactly supported in , that is, for all , . We may also alternatively assume the existence of an explicit state transition model describing the temporal evolution of the state, as
(2) 
where, for each , constitutes a measurable nonlinear state transition mapping with somewhat “favorable” analytical behavior (see below) and , for ,
, denotes a white noise process with state space
. The recursion defined in (2) is initiated by choosing , independently of .The state is partially observed through the conditionally Gaussian process
(3) 
, with conditional means and variances known apriori, for all . Additionally, we assume that , with , for all , where is bounded. The observations (3) can also be rewritten in the canonical form , for all , where constitutes a standard Gaussian white noise process and, for all , . The process is assumed to be mutually independent of , and of the innovations , in case .
The class of partially observable systems described above is very wide, containing all (first order) Hidden Markov Models (HMMs) with compactly supported state processes and conditionally Gaussian measurements. Hereafter, without loss of generality and in order to facilitate the presentation, we will assume stationarity of state transitions, dropping the subscript “” in the respective stochastic kernels and/or transition mappings. However, we should mention that all subsequent results hold true also for the nonstationary case, if one assumes that any condition hereafter imposed on the mechanism generating holds for all , that is, for all different “modes” of the state process. As in [KalPetNonlinear_2015], the following additional technical assumptions are made.
Assumption 1: (Boundedness) The quantities , are each uniformly upper bounded both with respect to and , with finite bounds and , respectively. For technical reasons, it is also true that . This can always be satisfied by normalization of the observations. If is substituted by the , then all the above continue to hold almost everywhere.
Assumption 2: (Continuity & Expansiveness) All members of the family are uniformly Lipschitz continuous on with respect to the norm. Additionally, all members of the family are elementwise uniformly Lipschitz continuous on with respect to the norm. If is regarded as the essential state space of , then all the above statements are understood essentially.
Remark 1.
In certain applications, conditional Gaussianity of the observations given the state may not be a valid modeling assumption. However, such a structural assumption not only allows for analytical tractability when it holds, but also provides important insights related to the performance of the respective approximate filter, even if the conditional distribution of the observations is not Gaussian, provided it is “sufficiently smooth and unimodal”.
2.2 Prior Results & Problem Formulation
Before proceeding and for later reference, let us define the complete natural filtrations generated by the processes and as and , respectively.
Adopting the MMSE as an optimality criterion for inferring the hidden process on the basis of the observations, one would ideally like to discover an efficient way for evaluating the conditional expectation or filter of the state, given the available information encoded in , sequentially in time. Unfortunately, except for some very special cases, [Segall_Point1976, Marcus1979, Elliott1994Exact, Elliott1994_HowToCount], it is well known that the optimal nonlinear filter does not admit an explicit finite dimensional representation [Segall1976, Elliott1994Hidden].
As a result, one must resort to properly designed approximations to the general nonlinear filtering problem, leading to well behaved, finite dimensional, approximate filtering schemes. Such schemes are typically derived by approximating the desired quantities of interest either heuristically (see, e.g. [Kushner1967_Approximations, ItoXiong1997]), or in some more powerful, rigorous sense, (see, e.g., Markov chain approximations [Kushner2001_BOOK, Kushner2008, Pages2005optimal], or particle filtering techniques [PARTICLE2002tutorial, Crisan2002Survey]). In this paper, we follow the latter direction and propose a novel, rigorous development of grid based approximate filtering, focusing on the class of partially observable systems described in Section 2.A. For this, we exploit the general asymptotic results presented in [KalPetNonlinear_2015].
Our analysis is based on a well known representation of the optimal filter, employing the simple concept (at least in discrete time) of change of probability measures (see, e.g., [Elliott1994_HowToCount, Elliott1994Exact, Elliott1994Hidden, Elliott2005JUMP]). Let denote the filter of given , under the base measure . Then, there exists another (hypothetical) probability measure [Elliott1994Hidden, KalPetNonlinear_2015], such that
(4) 
where and , for all , with denoting the multivariate Gaussian density as a function of , with mean and covariance matrix . Here, we also define . The most important part is that, under , the processes (including the initial value ) and are mutually statistically independent, with being the same as under the original measure and being a Gaussian vector white noise process with zero mean and covariance matrix the identity. As one might guess, the measure is more convenient to work with. It is worth mentioning that the FeynmanKac formula (4) is true regardless of the nature of the state , that is, it holds even if is not Markov. In fact, the machinery of change of measures can be applied to any nonlinear filtering problem and is not tied to the particular filtering formulations considered in this paper [Elliott1994Hidden].
Let us now replace in the RHS of (4) with another process , called the approximation, with resolution or approximation parameter (conventionally), also independent of the observations under , for which the evaluation of the resulting “filter” might be easier. Then, we can define the approximate filter of the state
(5) 
It was shown in [KalPetNonlinear_2015] that, under certain conditions, this approximate filter is asymptotically consistent, as follows.
Hereafter, denotes the indicator of . Given and for any Borel , constitutes a Dirac (atomic) probability measure. Equivalently, we write . Also, convergence in probability is meant to be with respect to the norm of the random elements involved. Additionally, below we refer to the concept to weak convergence, which is nothing but weak convergence [BillingsleyMeasures]
of conditional probability distributions
[Berti2006, Grubel2014]. For a sufficient definition, the reader is referred to [KalPetNonlinear_2015].Theorem 1.
(Convergence to the Optimal Filter [KalPetNonlinear_2015]) Pick any natural and suppose either of the following:

For all , the sequence is marginally weakly convergent to , given , that is,
(6) 
For all , the sequence is (marginally) convergent to in probability, that is,
(7)
Then, there exists a measurable subset with measure at least , such that
(8) 
for any free, finite constant . In other words, the convergence of the respective approximate filtering operators is compact in and, with probability at least , uniform in .
Remark 2.
It should be mentioned here that Theorem 1 holds for any process , Markov or not, as long as is almost surely compactly supported.
Remark 3.
The mode of filter convergence reported in Theorem 1 is particularly strong. It implies that inside any fixed finite time interval and among almost all possible paths of the observations process, the approximation error between the true and approximate filters is finitely bounded and converges to zero, as the grid resolution increases, resulting in a practically appealing asymptotic property. This mode of convergence constitutes, in a sense, a practically useful, quantitative justification of Egorov’s Theorem [Richardson2009measure], which abstractly relates almost uniform convergence with almost sure convergence of measurable functions. Further, it is important to mention that, for fixed , convergence to the optimal filter tends to be in the uniformly almost everywhere sense, at an exponential rate with respect to the dimensionality of the observations, . This shows that, in a sense, the dimensionality of the observations stochastically stabilizes the approximate filtering process.
Remark 4.
Observe that the adopted approach concerning construction of the approximate filter of , the approximation is naturally constructed under the base measure , satisfying the constraint of being independent of the observations, . However, it is easy to see that if, for each in the horizon of interest, is adapted, then it may be defined under the original base measure without any complication; under , (and, thus, ) is independent of by construction. In greater generality, may be constructed under , as long as it can be somehow guaranteed to follow the same distribution and be independent of under . As we shall see below, this is not always obvious or true; if fact, it is strongly dependent on the information (encoded in the appropriate algebra) exploited in order to define the process , as well as the particular choice of the alternative measure .
3 Uniform State Quantizations
Although Theorem 1 presented above provides the required conditions for convergence of the respective approximate filter, it does not specify any specific class of processes to be used as the required approximations. In order to satisfy either of the conditions of Theorem 1, must be strongly dependent on . For example, if the approximation is merely weakly convergent to the original state process (as, for instance, in particle filtering techniques), the conditions of Theorem 1 will not be fulfilled. In this paper, the state is approximated by another closely related process with discrete state space, constituting a uniformly quantized approximation of the original one.
Similarly to [Pages2005optimal], we will consider two types of state approximations: Marginal Quantizations and Markovian Quantizations. Specifically, in the following, we study pathwise properties of the aforementioned state approximations. Nevertheless, and as in every meaningful filtering formulation, neither the state nor its approximations need to be known or constructed by the user. Only the (conditional) laws of the approximations need to be known. To this end, let us state a general definition of a quantizer.
Definition 1.
(Quantizers) Consider a compact subset , a partition of and let be a discrete set consisting of distinct reconstruction points, with . Then, an level Euclidean Quantizer is any bounded and measurable function , defined by assigning all to a unique , such that the mapping between the elements of and is one to one and onto (a bijection).
3.1 Uniformly Quantizing
For simplicity and without any loss of generality, suppose that (for and with obviously ), representing the compact set of support of the state . Also, consider a uniform set partition of the interval , and, additionally, let be the overloaded Cartesian product of copies of the partitions defined above, with cardinality . As usual, our reconstruction points will be chosen as the center of masses of the hyperrectangles comprising the hyperpartition , denoted as , where . According to some predefined ordering, we make the identification , . Further, let and define the quantizer , where
(9) 
Given the definitions stated above, the following simple and basic result is true. The proof, being elementary, is omitted.
Lemma 1.
(Uniform Convergence of Quantized Values) It is true that
(10) 
that is, converges as , uniformly in .
Remark 5.
We should mention here that Lemma 1, as well as all the results to be presented below hold equally well when the support of is different in each dimension, or when different quantization resolutions are chosen in each dimension, just by adding additional complexity to the respective arguments.
3.2 Marginal Quantization
The first class of state process approximations of interest is that of marginal state quantizations, according to which is approximated by its nearest neighbor
(11) 
, where is identified as the approximation parameter. Next, we present another simple but important lemma, concerning the behavior of the quantized stochastic process , as gets large. Again, the proof is relatively simple, and it is omitted.
Lemma 2.
(Uniform Convergence of Marginal State Quantizations) For , for all , almost surely, it is true that
(12) 
that is, converges as , uniformly in and uniformly almost everywhere in .
Remark 6.
One drawback of marginal approximations is that they do not possess the Markov property any more. This fact introduces considerable complications in the development of recursive estimators, as shown later in Section 4. However, marginal approximations are practically appealing, because they do not require explicit knowledge of the stochastic kernel describing the transitions of [KalPetChannelMarkov2014, KalPet_SPAWC_2015].
Remark 7.
Note that the implications of Lemma 2 continue to be true under the base measure . This is true because is adapted, and also due to the fact that the “local” probability spaces and are completely identical. Here, constitutes the join of the filtration . In other words, the restrictions of and on the collection of events ever to be generated by  coincide; that is, .
3.3 Markovian Quantization
The second class of approximations considered is that of Markovian quantizations of the state. In this case, we assume explicit knowledge of a transition mapping, modeling the temporal evolution of . In particular, we assume a recursion as in (2), where the process acts as the driving noise of the state and constitutes an intrinsic characteristic of it. Then, the Markovian quantization of is defined as
(13) 
with , , and which satisfies the Markov property trivially; since is finite, it constitutes a (timehomogeneous) finite state space Markov Chain. A scheme for generating is shown in Fig. 1.
At this point, it is very important to observe that, whereas is guaranteed to be Markov with the same dynamics and independent of under , we cannot immediately say the same for the Markovian approximation . The reason is that is measurable with respect to the filtration generated by the initial condition and the innovations process and not with respect to . Without any additional considerations, may very well be partially correlated relative to and/or , and/or even non white itself! Nevertheless, may be chosen such that indeed satisfies the aforementioned properties under question, as the following result suggests.
Lemma 3.
(Choice of ) Without any other modification, the base measure may be chosen such that the initial condition and the innovations process follow the same distributions as under and are all mutually independent relative to the observations, .
Proof of Lemma 3.
See Appendix F. ∎
Lemma 3 essentially implies that Markovian quantizations may be constructed and analyzed either under or , interchangeably. Also adapt Remark 7 to this case.
Under the assumption of a transition mapping, every possible path of is completely determined by fixing and at any particular realization, for each . As in the case of marginal quantizations, the goal of the Markovian quantization is the pathwise approximation of by , for almost all realizations of the white noise process and initial value . In practice, however, as noted in the beginning of this section, knowledge of is of course not required by the user. What is required by the user is the transition matrix of the Markov chain , which could be obtained via, for instance, simulation (also see Section IV).
For analytical tractability, we will impose the following reasonable regularity assumption on the expansiveness of the transition mapping :
Assumption 3 (Expansiveness of Transition Mappings): For all , is Lipschitz continuous in , that is, possibly dependent on each , there exists a nonnegative, bounded constant , where exists and is finite, such that
(14) 
. If, additionally, , then will be referred to as uniformly contractive.
Employing Assumption 3, the next result presented below characterizes the convergence of the Markovian state approximation to the true process , as the quantization of the state space gets finer and under appropriate conditions.
Lemma 4.
(Uniform Convergence of Markovian State Quantizations) Suppose that the transition mapping of the Markov process is Lipschitz, almost surely and for all . Also, consider the approximating Markov process , as defined in (13). Then,
(15) 
that is, converges as , in the pointwise sense in and uniformly almost everywhere in . If, additionally, is uniformly contractive, almost surely and for all , then it is true that
(16) 
that is, the convergence is additionally uniform in .
Proof of Lemma 4.
See Appendix A. ∎
Especially concerning temporally uniform convergence of the quantization schemes under consideration, and to highlight its great practical importance, it would be useful to illustrate the implications of Lemmata 2 and 4 by means of the following simple numerical example.
Example 1.
Let be a scalar, first order autoregressive process (), defined via the linear stochastic difference equation
(17) 
where . In our example, the parameter is known apriori and controls the stability of the process, with the case where corresponding to a Gaussian random walk. Of course, it is true that the state space of the process defined by (17) is the whole , which means that, strictly speaking, there are no finite and such that , with probability . However, it is true that for sufficiently large but finite and , there exists a “large” measurable set of possible outcomes for which , being a Gaussian process, indeed belongs to with very high probability. Whenever this happens, we should be able to verify Lemmata 2 and 4 directly.
Additionally, it is trivial to verify that the linear transition function in (17) is always a contraction, with Lipschitz constant , whenever the process of interest is stable, that is, whenever .
Fig. 2(a) and 2(b) show the absolute errors between two processes and their quantized versions according to Lemmata 2 and 4, for and , respectively. From the figure, one can readily observe that the marginal quantization of always converges to uniformly in time, regardless of the particular value of , experimentally validating Lemma 2. On the other hand, it is obvious that when the transition function of our system is not a contraction (Lemma 4), uniform convergence of the respective Markovian quantization to the true state cannot be guaranteed. Of course, we have not proved any additional necessity regarding our sufficiency assumption related to the contractiveness of the transition mapping of the process of interest, meaning that there might exist processes which do not fulfill this requirement and still converge uniformly. However, for uniform contractions, the convergence will always be uniform whenever the process is bounded in .
4 Grid Based Approximate Filtering:
Recursive Estimation & Asymptotic Optimality
It is indeed easy to show that when used as candidate state approximations for defining approximate filtering operators in the fashion of Section 2.B, both the marginal and Markovian quantization schemes presented in Sections 3.B and 3.C, respectively, converge to the optimal nonlinear filter of the state . Convergence is in the sense of Theorem 1 presented in Section 2.B, corroborating asymptotic optimality under a unified convergence criterion.
Specifically, under the respective (and usual) assumptions, Lemmata 2 and 4 presented above imply that both the marginal and Markovian approximations converge to the true state at least in the almost sure sense, for all . Therefore, both will also converge to the true state in probability, satisfying the second sufficient condition of Theorem 1. The following result is true. Its proof, being apparent, is omitted.
Theorem 2.
(Convergence of Approximate Filters) Pick any natural and let the process represent either the marginal or the Markovian approximation of the state . Then, under the respective assumptions implied by Lemmata 2 and 4, the approximate filter converges to the true nonlinear filter , in the sense of Theorem 1.
Although Theorem 2 shows asymptotic consistency of the marginal and Markovian approximate filters in a strong sense, it does not imply the existence of any finite dimensional scheme for actually realizing these estimators. This is the purpose of the next subsections. In particular, we develop recursive representations for the asymptotically optimal (as ) filter , as defined previously in (5).
For later reference, let us define the bijective mapping (a trivial quantizer) , where the set contains the complete standard basis in . Since is bijectively mapped to for all , we can write , where constitutes the respective reconstruction matrix. From this discussion, it is obvious that
(18) 
leading to the expression
(19) 
for all , regardless of the type of state quantization employed. We additionally define the likelihood matrix
(20) 
Also to be subsequently used, given the quantization type, define the column stochastic matrix
as(21) 
for all .
At this point, it will be important to note that the transition matrix defined in (21) is implicitly assumed to be time invariant, regardless of the state approximation employed. Under the system model established in Section 2.A (assuming temporal homogeneity for the original Markov process ), this is unconditionally true when one considers Markovian state quantizations, simply because the resulting approximating process constitutes a Markov chain with finite state space, as stated earlier in Section 3.C. On the other hand, the situation is quite different when one considers marginal quantizations of the state. In that case, the conditional probabilities
(22) 
which would correspond to the th element of the resulting transition matrix, are, in general, not time invariant any more, even if the original Markov process is time homogeneous. Nevertheless, assuming the existence of at least one invariant measure (a stationary distribution) for the Markov process , also chosen as its initial distribution, the aforementioned probabilities are indeed time invariant. This is a very common and reasonable assumption employed in practice, especially when tracking stationary signals. For notational and intuitional simplicity, and in order to present a unified treatment of all the approximate filters considered in this paper, the aforementioned assumption will also be adopted in the analysis that follows.
4.1 Markovian Quantization
We start with the case of Markovian quantizations, since it is easier and more straightforward. Here, the development of the respective approximate filter is based on the fact that constitutes a Markov chain. Actually, this fact is the only requirement for the existence of a recursive realization of the filter, with Lemma 3 providing a sufficient condition, ensuring asymptotic optimality. The resulting recursive scheme is summarized in the following result. The proof is omitted, since it involves standard arguments in nonlinear filtering, similar to the ones employed in the derivation of the filtering recursions for a partially observed Markov chain with finite state space [Elliott1994Exact, Elliott1994Hidden, Cappe_BOOK2005], as previously mentioned.
Theorem 3.
(The Markovian Filter) Consider the Markovian state approximation and define , for all . Then, under the appropriate assumptions (Lipschitz property of Lemma 4), the asymptotically optimal in approximate grid based filter can be expressed as
(23) 
where the process satisfies the linear recursion
(24) 
The filter is initialized setting .
Remark 8.
It is worth mentioning that, although formally similar to, the approximate filter introduced in Theorem 3 does not refer to a Markov chain with finite state space, because the observations process utilized in the filtering iterations corresponds to that of the real partially observable system under consideration. The quantity does not constitute a conditional expectation of the Markov chain associated with , because the latter process does not follow the probability law of the true state process .
Remark 9.
In fact, may be interpreted as a vector encoding an unnormalized point mass function, which, roughly speaking, expresses the belief of the quantized state, given the observations up to and including time . Normalization by corresponds precisely to a point mass function.
Remark 10.
For the benefit of the reader, we should mention that the Markovian filter considered above essentially coincides with the approximate grid based filter reported in ([PARTICLE2002tutorial], Section IV.B), although the construction of the two filters is different: the former is constructed via a Markovian quantization of the state, whereas the latter [PARTICLE2002tutorial] is based on a “quasimarginal” approach (compare with (22)). Nevertheless, given our assumptions on the HMM under consideration, both formulations result in exactly the same transition matrix. Therefore, the optimality properties of the Markovian filter are indeed inherited by the grid based filter described in [PARTICLE2002tutorial].
4.2 Marginal Quantization
We now move on to the case of marginal quantizations. In order to be able to come up with a simple, Markov chain based, recursive filtering scheme, as in the case of Markovian quantizations previously treated, it turns out that a further assumption is required, this time concerning the stochastic kernel of the Markov process . But before embarking on the relevant analysis, let us present some essential definitions.
First, for any process , we will say that a sequence of functions is , if is Uniformly Integrable with respect to the pushforward measure induced by , , where , i.e.,
(25) 
Second, given , recall from Section 3.A that the set contains as members all quantization regions of , , . Then, given the stochastic kernel associated with the time invariant transitions of and for each , we define the cumulative kernel
(26) 
for all Borel and all , where denotes the unique quantization region, which includes . Note that if is substituted by , the resulting quantity constitutes an predictable setvalued random element. Now, if, for any , admits a stochastic kernel density suggestively denoted as , we define, in exactly the same fashion as above, the cumulative kernel density
(27) 
for all . The fact that is indeed a RadonNikodym derivative of readily follows by definition of the latter and Fubini’s Theorem.
Remark 11.
Observe that, although integration is with respect to on the RHS of (26), is time invariant. This is due to stationarity of , as assumed in the beginning of Section 4, implying time invariance of the marginal measure , for all . Additionally, for each , when is restricted to , corresponds to an entry of the (time invariant) matrix , also defined earlier. In the general case, where the aforementioned cumulative kernel is time varying, all subsequent analysis continues to be valid, just by adding additional notational complexity.
In respect to the relevant assumption required on , as asserted above, let us now present the following definition.
Definition 2.
(Cumulative Conditional Regularity of Markov Kernels) Consider the kernel , associated with , for all . We say that is Conditionally Regular of Type I (CRT I), if, for almost all , there exists a sequence with , such that
(28) 
If, further, for almost all , the measure admits a density and if there exists another sequence with , such that
(29) 
is called Conditionally Regular of Type II (CRT II). In any case, will also be called conditionally regular.
A consequence of conditional regularity is the following Martingale Difference (MD) [Segall1976, Elliott1994Hidden] type representation of the marginally quantized process .
Lemma 5.
(Semirecursive MDtype Representation of Marginal Quantizations) Assume that the state process is conditionally regular. Then, the quantized process admits the representation
(30) 
where, under the base measure , constitutes an MD process and constitutes a predictable process, such that

if is CRT I, then
(31) 
whereas, if is CRT II, then
(32)
everywhere in time.
Proof of Lemma 5.
See Appendix B. ∎
Now, consider an auxiliary Markov chain , with (defined as in (21)) as its transition matrix and with initial distribution to be specified. Of course, can be represented as , where constitutes a MD process, with being the complete natural filtration generated by .
Due to the existence of the “bias” process in the martingale difference representation of (see Lemma 5), the direct derivation of a filtering recursion for this process is difficult. However, it turns out that the approximate filter involving the marginal state quantization , , can be further approximated by the also approximate filter