Distributed state estimation and tracking of partially observed processes constitute central problems in modern Distributed Networks of Agents (DNAs), where, in the absence of a powerful fusion center, a group of power constrained sensing devices with limited computation and/or communication capabilities receive noisy measurements of some common, possibly rapidly varying process of global network interest. As also surveyed in [DPF_SPM_2013], important applications of DNAs, where potentially the aforementioned problems arise, include environmental and agricultural monitoring [APP1_2010], health care monitoring [APP2_2010], pollution source localization [APP4_2007], surveillance [APP5_2009], chemical plume tracking [APP3_2004_zhao2004wireless], target tracking [APP3_2004_zhao2004wireless] and habitat monitoring [APP6_2002], to name a few.
In the context of linear state estimation, in recent years and in parallel with the rapid development of fast averaging consensus protocols and algorithms in networked measurement systems [Consensus1_2004_Olfati-Saber, Consensus2_2005_Olfati-Saber, Boyd_2006Gossip, Consensus3_2010, Consensus4_2010_Oreshkin, Consensus_Emiliano_2011]
, there has been extensive research on the important basic problem of distributed Kalman filtering over DNAs, under several different perspectives, such as utilizing control theoretic consensus algorithms[DKF1_2005_Olfati-Saber, DKF4_Olfati-Saber, DKF2_2007_Olfati-Saber], the sign-of-innovations approach [DKF5_2006_SOIKF_Riberio], the Alternating Direction Method of Multipliers (ADMM) [DKF6_2008_Schizas], which will also be considered in this work and others, based on more customized consensus strategies [DKF3_2006_alriksson, DKF7_2008]. For a relatively complete list of references, see the extensive survey [DKF_2013_Survey].
Although not as rich, the literature concerning the problem of distributed nonlinear filtering of general Markov processes is itself quite extensive. Since, in general, most nonlinear filters do not admit finite dimensional representations, the focus in this case is the derivation of distributed schemes for the implementation of well defined, finite dimensional nonlinear filtering approximations, which would allow for efficient, real time state estimation. In this direction, successful examples include distributed particle filters [DPF_SPM_2013], distributed extended Kalman filters [DKF4_Olfati-Saber, DKF5_2006_SOIKF_Riberio], as extensions to the linear case mentioned above and, more recently, the Bayesian consensus filtering approach proposed in [BCF1_2014].
In this paper, we focus on distributed state estimation of Markov chains with finite state space, partially observed in conditionally Gaussian noise
. Hereafter, we will use the term Gaussian Hidden Markov Model (HMM). This special case is, however, of broad practical interest. Further, it is known that the posterior probability measure of the chain at a certain time, relative to the measurements obtained so far, admits a recursive representation[Elliott1994Exact, Elliott1994Hidden]. This posterior can subsequently be used in order to produce any optimal estimate of interest, such as the MMSE estimator of the chain, the respective MAP estimator, etc. The noisy observations of the chain are obtained distributively over a DNA, whose connectivity pattern follows a connected Random Geometric Graph (RGG) [RGG_Dall2002, RGG_Penrose2003].
Under this setting, we consider a distributed filtering scheme, which relies on the distributed evaluation of the likelihood part of the centralized posterior under consideration and is based on a particular specialization of the ADMM for fast average consensus [Consensus_Emiliano_2011]. Our idea is similar to the one conveyed by the respective formulation for particle filters [DPF_SPM_2013]. In order to account for rapidly varying Markov chains with arbitrary statistical structure, in our formulation, nonlinear filtering and distributed average consensus are implemented in different time scales, with the message exchange rate between any sensor and its neighbors being much larger than the rate of measurement acquisition [DKF4_Olfati-Saber]. This is a realistic assumption in systems where measurements are acquired at relatively distant times; for example, every minute, hour or day, or even more dynamically, possibly at event triggered time instants or a sequence of stopping times. Additionally, in this paper we assume perfect communications among sensors. This is a reasonable assumption, provided, for instance, that the filtering process is implemented in a higher than the physical layer of the network under consideration. Our contributions are summarized as follows:
1) First, based on the matrix equivalent formulation of the ADMM for implementing average consensus and the respective convergence results presented in [Consensus_Emiliano_2011], we focus on optimizing the respective consensus error bound, with respect to two free parameters: a scalar([Consensus_Emiliano_2011], also see Section 2). Under indeed mild conditions, we show analytically that the error bound is optimized (in a certain sense) at an explicitly defined and by choosing such that its second largest eigenvalue is minimized, the latter being a convex optimization problem, which can be efficiently solved [Boyd_2006Gossip]. Our results essentially complement earlier work on ADMM-based consensus, previously presented in [Consensus_Emiliano_2011].
2) Then, utilizing the aforementioned optimized bounds and assuming the same number of consensus iterations between any two consecutive measurements for each sensor in the network, we fully characterize a minimal number of such iterations required, such that the distributed filter remains uniformly stable with a prescribed accuracy level, , within a finite operational horizon, . Uniform stability is in the sense of the supremum of the -norm between the centralized and distributed versions of the posterior in each sensor, over all sensors and over all times within the operational horizon of interest. Roughly speaking, our result shows that, under very reasonable assumptions, the stability of the distributed filtering process depends only loglinearly on and the total number of measurements in the network, , and only logarithmically on . Additionally, if this total loglinear bound is fulfilled, any additional consensus iterations will incur an exponential decrease in the aforementioned -norm. The result is fundamental and universal, since, apart from the assumed conditional Gaussianity of the observations, it is virtually independent of the internal structure of the particular HMM under consideration. This fact makes it particularly attractive in highly heterogeneous DNAs.
The problem of distributed inference in HMMs has been considered earlier in [DHMM1_2010]. However, the approach taken in [DHMM1_2010] is distinctly different from our proposed approach. In particular, the formulation in [DHMM1_2010] is based on control theoretic consensus [Consensus2_2005_Olfati-Saber] and stochastic approximation, while our distributed filtering formulation is based on optimized ADMM-based consensus. Also, the method of [DHMM1_2010] requires certain assumptions on the statistical structure of the hidden Markov chain (e.g., primitiveness), while our work makes no such assumptions. Further, although the analysis presented in [DHMM1_2010] does provide some limited convergence guarantees, it does not provide any results on the rate of convergence of the obtained distributed estimators. Here, we provide a fully tractable stability analysis of the distributed filtering scheme considered, with explicit, optimistic and universal bounds on the rate of convergence, as well as the degree of consensus achieved. Another important difference is that, in [DHMM1_2010], filtering and consensus are implemented simultaneously. Such setting excludes cases where the underlying Markov chain is very rapidly varying and cannot fully exploit the benefits of heterogeneity in the information observed by the sensors of the DNA under consideration. This is due to the fact that, in [DHMM1_2010], global consensus cannot, in general, be guaranteed within a reasonable and quantitatively predictable error margin. Contrary to [DHMM1_2010], the distributed filtering scheme advocated herein efficiently exploits sensor heterogeneity, since uniform diffusion of local information is achieved across the network.
The paper is organized as follows. In Section II, we present the system model in detail, along with some very basic preliminaries on nonlinear filtering. Section III introduces our distributed filtering formulation and, subsequently, focuses on the optimization of the consensus error bounds produced by the ADMM iterations. In Section IV, we study stability of the distributed filtering scheme considered in the sense mentioned above, and we present the relevant results, along with complete proofs. In Section V, we present some numerical simulations, experimentally validating some of the properties of the proposed approach, as well as a relevant discussion. Finally, Section VI concludes the paper.
: In the following, the state vector will be represented as
, and all other matrices and vectors will be denoted by boldface uppercase and boldface lowercase letters, respectively. Real valued random variables will be denoted by uppercase letters. Calligraphic letters and formal script letters will denote sets and-algebras, respectively. The operators , and will denote transposition, minimum and maximum eigenvalue, respectively. The -norm of a vector is , for all naturals . For any Euclidean space, will denote the respective identity operator of appropriate dimension. For any set , its complement is denoted as . Additionally, we employ the identifications , , and , for any positive naturals .
2 System Model & Preliminaries
In this section, we first present a generic model of the class of systems under consideration, where multiple, possibly indirectly connected sensors observe noisy versions of a common, hidden (i.e., unobserved), stochastically evolving underlying signal of interest. Second, we briefly discuss the centralized solution to problem of inference of the underlying signal on the basis of the observations at the sensors.
2.1 System Model: HMMs over DNAs
Without loss of generality, we consider a wireless network consisting of sensors, located inside the fixed square geometric region . The connectivity pattern of the DNA under consideration is assumed to obey an undirected RGG model [RGG_Dall2002, RGG_Penrose2003]. According to such a model, the positions of the sensors, , are chosen uniformly at random in , whereas two sensors and are considered connected if and only if , where denotes the connectivity threshold of the network. Throughout the paper, we assume that the underlying RGG of the network is simply connected, which constitutes an event happening with very high probability for sufficiently large number of sensors and/or sufficiently large connectivity threshold [RGG_Penrose2003]. Also, the one hop neighborhood of each sensor , including itself, is denoted as , for all .
At each discrete time instant , sensor measures the vector process , for all , where . It is assumed that the stacked process constitutes a noisy version (i.e., a functional) of a (for simplicity) time-homogeneous Markov chain , called the state, with finite state space of cardinality . Conditioned on , the process is distributed according to a jointly Gaussian law as
where and constitute known measurable functionals of , for all . In particular, is assumed to be of block diagonal form, constituting of the submatrices , for all , also implying that , for all . As a result, the measurements at each sensor are statistically independent, given . Likewise, we define , where , for all and . Further, the following additional technical assumptions are made.
Assumption: (Boundedness) The quantities , are both uniformly upper bounded in and , with finite bounds and , respectively. For technical reasons, it is also true that , a requirement which can always be satisfied by normalization of the observations.
Regarding the hidden Markov chain under consideration, it is true that , where denotes the -th state of the chain, for . Although it is actually irrelevant, for simplicity we will assume that , for all . The temporal dynamics of are expressed in terms of the initial distribution of the chain, encoded in the vector , as well as the column stochastic transition matrix , defined, as usual, as for all .
2.2 Centralized Recursive Estimates
Let be the complete filtration generated by the observations . A quantity of central importance in nonlinear filtering is the posterior probability measure of the state given , which, since the state space of is finite, can be represented by a random vector . Under the system model defined above, this random vector can be exactly evaluated in real time as [Elliott1994Hidden, Elliott1994Exact]
where the process satisfies the linear recursion for all , with ,
Of course, under these circumstances and defining , the MMSE estimator of given can be easily obtained as for all . Similar estimates may be obtained for other quantities of possible interest, such as the posterior error covariance matrix between and , etc. In the following, we focus on the distributed estimation of the random probability measure encoded in (and not specific functionals of it).
The structure of the nonlinear filtering procedure outlined above may be interpreted as follows. Quantity (4) constitutes the conditional likelihood of the observations, given that the state (the hidden process) is fixed. As a result, the filtering recursion for may be decomposed into two intuitively informative parts, namely, a one-step ahead prediction step , and a correction/update step . Then, (re)normalization by converts the produced estimate into a valid probability measure.
3 Distributed Estimation via the ADMM
Employing somewhat standard notation, used, for instance, in [Consensus_Emiliano_2011], exploitting the block-diagonal structure of and, therefore, of as well, and with , we may reexpress (4) as
for all and . Let
. Then, it is true that
Of course, sensor of the network observes , for all . It is then clear that knowledge of at each sensor is equivalent to the ability of locally evaluating the posterior , since , which is the likelihood part of the filter at time . There are exactly possibilities for and, as a result, in order to make the computation of the likelihood matrix locally feasible, each sensor should be able to acquire , for all . But since there are no functional dependencies among the latter quantities, each can be separately computed, in a completely parallel fashion. Consequently, it suffices to focus exclusively on , for some fixed .
Freeze time temporarily and let . Also, consider a symmetric (doubly) stochastic matrix , with being non zero if and only if , for all ; precise selection of will be considered later (see Theorem 3 and relevant discussion). Then, employing the ADMM based, distributed averaging Method B as discussed in [Consensus_Emiliano_2011], with as the matrix of augmentation constants [Consensus_Emiliano_2011] and exploitting its symmetricity, it can be easily shown that the ADMM iterations [Consensus_Emiliano_2011] for the distributed computation of the average (8), at sensor , are reduced to
for all . The scheme is initialized as
Now, letting vary in , at each iteration and at each sensor , the true posterior measure is approximated as
where the process satisfies the linear recursion for all , with and where is defined as in (3), but with the likelihoods being replaced by
completing the algorithmic description of the distributed HMM under consideration. In the above, denotes the iteration index at time . In this paper, mainly for analytical and intuitional simplicity, we will assume that for all . This simply means that each estimate is recursively constructed using the “same-iteration” estimate , at the previous time step. See Fig. 1 for a schematic representation of the procedure explained above, for each sensor in the DNA under consideration.
For practical considerations, at this point, it would be worth specifying both the computational complexity and the information exchange complexity of the distributed filtering considered, per sensor , per slow time . With regards to computational complexity, it is easy to see that, at sensor , each consensus iteration incurs an order of operations of combined multiply-adds. Therefore, multiplying by the total number of consensus iterations per slow time, (the fast time), and adding the computational complexity incurred by filtering, results in total computational complexity at sensor , for each slow time, of the order of . One can see that, for a hidden chain with state space of large cardinality, , if , then the computational complexity of the distributed filtering scheme under consideration is of the same order as that of filtering alone. As far as information exchange complexity is concerned, following a similar procedure, it can be readily shown that an order of bidirectional (at worst) message exchanges have to take place at sensor , per slow time .
Arguing as in [Consensus_Emiliano_2011], it is then straightforward to show that of central importance concerning the convergence of the ADMM scheme is the eigenstructure and, in particular, the Second Largest Eigenvalue Modulus (SLEM) [Boyd2004Fastest] of the matrix
Hereafter, let the sets and contain the eigenvalues of and , respectively. Then, recalling that, due to symmetricity, the spectrum of is real, set and . Because the zero pattern of is the same as that of the adjacency matrix of the connected RGG modeling the connectivity of the DNA under consideration (with ones in its diagonal entries), it can be easily shown that corresponds to the transition matrix of a Markov chain, which is both irreducible and aperiodic. Therefore, the SLEM of is strictly smaller than one, and, consequently, as well [Boyd2004Fastest]. Likewise, since has a unique unit eigenvalue [Consensus_Emiliano_2011], let , and also let (also see [Consensus_Emiliano_2011]) denote the SLEM of , that is, . For later reference, define the auxiliary vector .When applied to the average consensus problem, and specifically to the distributed filtering problem under consideration, the rate of convergence of the ADMM is characterized as follows.
(ADMM Consensus Error Bound [Consensus_Emiliano_2011]) Fix a natural , denoting a finite operational horizon. For each and an arbitrary , it is true that, for all ,
An apparent conclusion of Theorem 1 is that the rate of convergence of the distributed averaging procedure depends to a large extent on and of course the SLEM of , , which, by construction, is also a function of . On the other hand, depends on the eigenstructure of , also by construction. As it turns out, the full spectrum of is explicitly related to the spectrum of . The relevant result follows, complementing the analysis previously presented in [Consensus_Emiliano_2011].
(Characterization of the Spectrum of ) Given and for any , the eigenvalues of may be expressed in pairs as
for all . Then, the SLEM of can be expressed as
where, for all ,
Further, for fixed , is globally minimized at
with optimal value
constituting a continuous and increasing function of the second largest eigenvalue of , .
Proof of Theorem 2.
It will be important to note that all results presented so far regarding convergence of the ADMM based consensus schema, as well as the eigenstructure of , hold equally well even if we relax our obvious demand that the elements of the symmetric (doubly) stochastic matrix , corresponding to the non zero entries of the adjacency matrix of the RGG under consideration, are strictly positive. Such an assertion can be validated by inspecting the respective proof of Theorem 1 in [Consensus_Emiliano_2011] and observing that the aforementioned assumption on the entries is actually not required; it is the eigenstructure of that really matters. On the other hand, the property of having its second largest eigenvalue strictly smaller than unity is not destroyed by relaxing strict positivity of its elements; the relaxation corresponds to an enlargement of the space of possible choices for [Boyd2004Fastest]. In addition to the above, it can be readily verified by Theorem 2 that, if , will have exactly one unit eigenvalue and that, under the same condition, , satisfying the requirement for convergence implied by Theorem 1.
All the above will be critical for us, since if we assume that the diagonal entries of
may be nonnegative, then there are increased degrees of freedom for choosing the best, such that the consensus error bound of Theorem 1 is optimized, and there are well established tools for achieving this goal [Boyd_2006Gossip], as we will discuss below. However, it should be mentioned that mere nonnegativity of the entries of implies that the consensus scheme employed for distributed filtering does not constitute an ordinary ADMM variant anymore, since in the usual ADMM based consensus formulation, the constants multiplying the quadratic part of the augmented Lagrangian are positive by default. This fact though does not compromise performance; in fact, it results in a faster distributed averaging scheme. Therefore, hereafter, we will assume that the zero pattern of “includes” that of the adjacency matrix of the underlying RGG, in the sense that all elements of are allowed to be nonnegative, except for those corresponding to the zero pattern of the aforementioned adjacency matrix.
Fig. 2 shows the graph of the optimal SLEM as a function of , as described in Theorem 2, as well as the nonoptimized SLEM , for a set of fixed suboptimal choices of ’s. The obvious fact that is a convex, non affine and increasing function of is very important, because it implies that can be further optimized with respect to the matrix , resulting in substantial performance gain as moves towards zero. In particular, it is true that
possibly for different decisions on on the center and right of (22), where denotes the feasible set of symmetric (doubly) stochastic matrices, with being zero if , for all . It is well known that the optimization problem on the right of (22) is convex and that it can be solved efficiently either via semidefinite programming, or subgradient methods [Boyd_2006Gossip]. Therefore, the objective on the left of (22) can be globally optimized as well.
Quite surprisingly, under some additional, albeit very mild, conditions on the ADMM iteration index, , optimization of the SLEM of , either with respect to or both and indeed implies optimization of the consensus error bound of the ADMM, as stated in Theorem 1, under a certain sense. In this respect, we present the next result.
(ADMM Optimal Consensus Error Bound I) Fix a natural and choose , , for some . For each and any , as long as , the choice is Pareto optimal for the scalar, multiobjective program
optimizing the RHS of (16), resulting in the optimal consensus error bound
being a continuous and decreasing function of in . Additionally, choose any . If, under the same setting as above, is replaced by the optimal value in (24), then the resulting bound is optimal, in the sense that
Proof of Theorem 3.
See Appendix B. ∎
A graphical demonstration of the second, somewhat technical part of Theorem 3 is shown in Fig. 3. As it can be readily observed, given an arbitrary, possibly “bad” , corresponding to a specific, possibly large, second largest eigenvalue (in the figure, this value equals ), the best convergence rate is attained at , when the feasible set is constrained such that . Also note that, because is increasing in , Pareto efficiency of (24) is preserved when is replaced by .
Some of the technicalities of Theorem 3 may be avoided if one focuses on optimizing the looser consensus error bound
since . Indeed, the following result holds; proof is omitted.
(ADMM Optimal Consensus Error Bound II) Fix a natural and choose , , for some . For each and any , the globally optimal consensus error bound corresponding to the RHS of (27), with respect to
is given by
for all .
Finally, in our main results, presented in Section 4, the quantity will be of great importance in establishing a minimal number of ADMM iterations required between each pair of subsequent observation times, ensuring stability of the distributed nonlinear filter under consideration, over a finite horizon of filtering operation, . Interestingly, does not appear coincidentally in our analysis; when analyzing the mixing properties of Markov chains over graphs, constitutes a known quantity, called the mixing time of the chain [Boyd2004Fastest]. In this fashion, we call the mixing time of , characterizing the convergence rate of the ADMM.
Of course, it is both expected and intuitively correct that and, therefore, , will be both dependent on the connectivity density of the underlying RGG (measured, for instance, through the sparsity of the adjacency matrix of the graph), modeling the connectivity of the DNA under study. More specifically, we should at least expect that (which constitutes a random variable) will be in general smaller in “more strongly” connected RGGs, and the same for . Also, since more sensors uniformly scattered in the same area mean essentially higher connectivity density, the same behavior is expected as the number of sensors increase. For example, consider the case where we are given a DNA corresponding to a strongly connected RGG. Unfortunately though, an analytic characterization of the aforementioned properties boils down to analyzing the relevant properties of , which constitutes a notoriously difficult (and still open) mathematical problem. Despite of this difficulty, we can verify the aforementioned behavior experimentally. Indeed, Fig. 4 shows (a) and (b) as functions of the number of sensors, . For each value of , trials are presented. In this example, is suboptimally chosen according to the maximum degree chain construction [Boyd2004Fastest], whereas is optimally chosen. We readily observe that our intuitive assertions stated above are successfully verified. Based on these results, in all the subsequent analysis, we will consider and to be bounded and in general decreasing functions of the number of sensors in the network, , and/or the connectivity threshold, . This fact constitutes the direct conclusion of Fig. 4. It is not straightforward to determine how the actual performance of the whole distributed filtering scheme, is affected by , the threshold and the corresponding values of and . This will be precisely the topic of interest in the next section, where we show that, in order to guarantee -global agreement on filtering estimates, it is sufficient that the number of consensus iterations per sensor, per slow time, , scales linearly with and, at worst, loglinearly with (note that, on average, is inversely proportional to , as discussed above).
4 Uniform Stability of Distributed Filtering
This section is devoted to the presentation and proof of the main results of the paper. Hereafter, for notational brevity, we will set and , in accordance to Theorem 3 presented in Section 3. Therefore, it will also be assumed that everything holds as long as , on top of any other condition on the iteration index, . If, for any reason, one wishes to consider Theorems 1 or 4, the aforementioned statements have to be modified accordingly.
The following lemmata, borrowed from [KalPetNonlinear_2015], will be helpful in the subsequent analysis.
(Telescoping Bound for Matrix Products [KalPetNonlinear_2015]) Consider the collections of arbitrary, square matrices
For any submultiplicative matrix norm , it is true that111Hereafter, we adopt the conventions .
(Stability of Observations [KalPetNonlinear_2015]) Consider the random quadratic form
Then, for any fixed and any freely chosen , there exists a bounded constant , such that the measurable set
that is, the sequence of quadratic forms is uniformly bounded with overwhelmingly high probability, under the probability measure .
The next result constitutes a corollary of Theorem 1. Here, for our purposes, we state it as a preliminary lemma.
(Rate of each Iterate) For each and an arbitrary , it is true that, for all ,
for all .
Proof of Lemma 3.
See Appendix C. ∎
Employing Lemma 2, we can characterize the growth of the the initial vector , for , as follows.
(Growth of Initial Values) There exists a , such that
with probability at least , for any free .
Proof of Lemma 4.
See Appendix D. ∎
Let us now present another lemma, providing a probabilistic uniform lower bound concerning the iterates , under appropriate conditions on the number of iterations, .
(A lower bound on each Iterate) It is true that, as long as and
the iterates generated by the ADMM are lower bounded as
with probability at least .
Proof of Lemma 5.
See Appendix E. ∎
Leveraging the results presented above, we may now provide a probabilistic bound on the required number of ADMM iterations, such that the norm between the unnormalized filtering estimates and is small.
(Stability of Unnormalized Distributed Estimates) Fix and choose any global accuracy level . Then, there exists such that, as long as and
the absolute error between and satisfies
with probability at least . That is, provided (38) holds locally at each sensor , at each , equals within , with overwhelmingly high probability.
Proof of Theorem 5.
First, we know that , for all , where . Therefore, simple induction shows that
Likewise, by construction of our distributed state estimator,
and for all , where , identically. Taking the difference between and , we can write
Of course, the -norm for a matrix (induced by the usual -norm for vectors) coincides with its maximum column sum. As a result, it will be true that , since constitutes a column stochastic matrix. Now, recalling the internal structure of the diagonal matrices and , , the fact that is strictly greater than unity and known apriori and Lemma 5 stated and proved above, it will be true that
for all , under the conditions the aforementioned lemma suggests. Then, the RHS of (43) can be further bounded as
Next, focusing on each term inside the summation above, we have (see Lemma 4)
Using the above into (45), we arrive at the inequality
But because and , we get
holding true for all , with probability at least , provided that
Finally, choose an . Then, a sufficient condition such that , for all and