It goes without saying that the Kalman-filter, an optimal state estimator for dynamic systems, has had a huge impact on various fields such as engineering, science, economics, etc. [1, 2, 3, 4]. Basically, the filter predicts the expectation of the system state and its covariance based on the dynamic model and the statistical information on the model uncertainty or process noise, and then correct them using new measurement, sensor model, and the information on measurement noise. When multiple sensors possibly different types are available, we can just combine the sensor models to process the measurements altogether.
Thanks to the rapid development of sensor devices and communication technology, we are now able to monitor large scale systems or environments such as traffic network, plants, forest, sea, etc. In those systems, sensors are geometrically distributed, may have different types, and usually not synchronized. To process the measurements, the basic idea would be to deliver all the data to one place, usually called fusion center, and do the correction step as in the case of multiple sensors. This is called the centralized Kalman-filtering (CKF). As expected, CKF requires a powerful computing device to handle a large number of measurements and sensor models, is exposed to a single point of failure, and is difficult to scale up. In order to overcome these drawbacks, researchers developed the distributed Kalam-filtering (DKF) in which each sensor in the network solves the problem by using local measurements and communicating with its neighbors. Compared with CKF, DKF has advantageous in terms of the scalability, robustness to component loss, computational cost, and thus the literature on this topic is expanding rapidly[5, 6, 7, 8, 9, 10, 11, 12]. For more details on DKF, see the survey  and references therein.
Some relevant results are summarized as follows. In , the author proposed scalable distributed Kalman-Bucy filtering algorithms in which each node only communicates with its neighbors. An algorithm with average consensus filters using the internal models of signals being exchanged is proposed in . It is noted that the algorithm works in a single-time scale. In the work , the authors proposed a continuous-time algorithm that makes each norm of all local error covariance matrices be bounded, thus overcomes a major drawback of . In , an algorithm with a high gain coupling term in the error covariance matrix is introduced and it is shown that the local error covariance matrix approximately converges to that of the steady-state centralized Kalman-filter. An in-depth discussion on distributed Kalman-filtering problem has been provided in [14, 15], and the algorithms that exchange the measurements themselves, or exchange certain signals instead of the measurements are proposed, respectively.
Although each of the existing algorithms has own novel ideas and advantages, to the best of the authors’ knowledge, we do not have a unified viewpoint for DKF problem. Motivated by this, it is the aim of this paper to provide a framework for the problem from the perspective of distributed optimization.
We start by observing that the correction step of Kalman-filtering is basically an optimization problem [2, 3, 4], and then formulate DKF problem as a consensus optimization problem, which provides a fresh look at the problem. This results in that DKF problem can be solved by many existing distributed optimization algorithms [16, 17, 18, 19, 20], expecting various DKF algorithms to be derived. As an instance, a new DKF algorithm employing the dual ascent method , one of the basic algorithms for distributed optimization problems, is provided in this paper.
This paper is organized as follows. In Section II, we recall CKF problem from the optimization perspective, and connects DKF problem to a distributed optimization problem. A new DKF algorithm based on dual ascent method is proposed in Section III, and numerical experiments evaluating the proposed algorithm is conducted in Section IV.
Notation: For matrices , …, , denotes the block diagonal matrix composed of to . For scalars ,…, , , and with matrices ’s is defined similarly.
denotes the vector whose components are all 1, and
is the identity matrix whose dimension is
. The maximum and minimum eigenvalue of a matrixare denoted by and
, respectively. For a random variable, denotes
is normally distributed with the mean
and the variance, and denotes the expected value of a random variable , i.e., . The half vectorization of a symmetric matrix is denoted by , whose elements are filled in Column-major order. where is element of , and denotes the inverse function of , . For a function , denotes the gradient vector .
Graph theory: For a network consisting of nodes, the communication among nodes is modeled by a graph . Let be an adjacency matrix associated to where is a weight of an edge between nodes and . If node communicates to node then, , or if not . Assume there is not self edge, i.e., . The Laplacian matrix associated to the graph , denoted by is a matrix such that , and . is a set of nodes communicating with node , i.e., .
Ii Distributed Kalman-filtering and Its Connection to Consensus Optimization
In this section, we recall CKF problem in terms of optimization, which is the maximum likelihood estimation, and establish a connection between DFK and distributed optimization.
Consider a discrete-time linear system with sensors described by
where is the state vector of the dynamic system, is the output vector, and is the output associated to sensor . ’s satisfy . is the system matrix and is the output matrix consisting of which is the output matrix associated to sensor . with is the process noise, is the measurement noise on sensor , and with . Assume that the pair is observable, and each is uncorrelated to for .
Ii-a Centralized Kalman-filtering problem from the optimization perspective
If all the measurements from sensors are collected and processed altogether, the problem can be seen as the one with a imaginary sensor that measures with complete knowledge on , thus called centralized Kalman-filtering.The filtering consists of two steps, prediction and correction. In the prediction step, the predicted estimate and error covariance matrix are obtained based on the previous estimate, error covariance matrix, and the system dynamics. The update rules are given by
where and are estimate and error covariance matrix in previous time, respectively, and , . Assume that is initialized as a positive definite matrix (, usually set as ).
In the correction step, the predicted estimate and the error covariance matrix are updated based on the current measurements containing the measurement noise. The correction step can be regarded as a process to find the optimal parameter (estimate) from the predicted estimate , error covariance , and the observation . In fact, it is known that this step is an optimization problem (maximum likelihood estimation, MLE) and we recall the details below.
Let and . Then, where . For the random variable , the likelihood function is given by
where the right-hand side is nothing but the probability density function ofwith the free variable .
Now, the maximum likelihood estimate is defined as
Since is a monotonically decreasing function with respect to , can also be obtained by
where . With the matrix inversion lemma, the Kalman-gain can be written as , which appears in the standard Kalman-filtering.
Ii-B Derivation of distributed Kalman-filtering problem
Now, we consider a sensor network which consists of sensors and suppose that each sensor runs an estimator without the fusion center. Each estimator in the network tries to find the optimal estimate by processing the local measurement and exchanging information with its neighbors through communication network. The communication network among estimators is modeled by a graph and the Laplacian matrix associated with is denoted by . Under the setting (1), estimator measures only the local measurement , and the parameters and are kept private to estimator . It is noted that the pair is not necessarily observable. We assume that the graph is connected and undirected i.e., , and and are open to all estimators.
Similar to CKF, DKF has two steps, local prediction and distributed correction. In the local prediction step, each estimator predicts
where and are local estimates of and , respectively, that estimator holds.
In the distributed correction step, each estimator solves the maximum likelihood estimation in a distributed manner. The objective function of CKF can be rewritten as
where , , . We assume that and . This makes sense when the each sensor reached a consensus on and in the previous correction step.
Assuming that each estimator holds its own optimization variable for , DKF problem is written as the following consensus optimization problem.
If there exists a distributed algorithm that finds a minimizer , we say that the algorithm solves DKF problem.
where is the Lagrange multipliers (dual variable) associated with (4b) and . We decompose the Lagrangian into local ones defined by
For the Lagrangian (5), the partial derivatives over and are given by
where , and . Then, the optimality condition for (, ) becomes the following saddle point equation (KKT conditions), namely
where and .
The solutions to DKF problem are parameterized as where and are unique vectors and is an arbitrary vector. If is an optimal solution to DKF problem, then is the optimal solution to CFK problem.
By multiplying to the dual feasibility equation in (7), one can obtain
Since and , it follows that
where and by the matrix inversion lemma, we have . From the fact that the right-hand side of above equation is the same with the update rule (2) of CKF, it follows that is the optimal estimate of CKF .
On the other hand, one can observe that the optimal dual variable is not unique since the dual feasibility equation
is singular. To find , consider the orthonormal matrix such that where ,
consists of the eigenvectors associated with the non-zero eigenvalues of, denoted by …, and . Left multiplying to the equation (10) yields
where and . Hence, the optimal dual variable becomes where is an arbitrary vector. This completes the proof.
Ii-C Information form of DKF problem
It is well known that the dual of the Kalman-filter is the Information filter which uses the canonical parameterization
to represent the normal (Gaussian) distribution. With the canonical parameterization, DKF problem (4) can also be written in information form.
Let , and which are the local decision variable for the information vector of the estimator , the locally predicted information matrix and information vector, respectively. With these transformations, we rewrite the problem (4) as
and . For the distributed problem (11), the Lagrangian is given by
where and is the Lagrange multipliers. The associated saddle point equation becomes
where , , and .
Ii-D Interpretations of existing DKF algorithm from the optimization perspective
One of the recent DKF algorithms, Consensus on Information (CI) [14, 15] can be interpreted in the provided framework. CI consists of three steps, prediction, local correction, and consensus. In the prediction step, each estimator predicts the estimate based on the system dynamics and previous estimate similar to the standard information filter algorithm. Each estimator also updates the estimate with local measurements and output matrix in the local correction step. After that, the estimators find the agreed estimate by averaging the local estimates in the consensus step.
In the provided framework, CI can be viewed as the algorithm which solves the problem (11) through the two steps, the local correction step and the consensus step. In the former step, each of estimators finds the local minimizer (estimate) of the local objective function . Since the partial derivative of becomes
and the local minimizer can be obtained by , which is the local update rule of CI111In the CI, the scalar is neglected .. The local minimizer, however, can be different among estimators, since it minimizes only the local objective function , which violates the constraint (11b).
The consensus step of CI performs a role to find an agreed (average) value of the local estimates, using the doubly stochastic matrix, and the results of the consensus step satisfy the constraint (11b). The agreed estimate, however, may not be the global minimizer of (11), which means that the consensus step cannot guarantee the convergence of the estimates to that of CKF.
Iii A Solution to DKF Problem
One can observe that (5) is strictly convex, differentiable, and the local objective function is a quadratic function, hence strong duality holds. In addition, from the fact is a nonsingular and block diagonal matrix, the optimal conditions (7) are already in a distributed form. This implies that the minimizer can be obtained in a distributed manner as long as is given, i.e., .
where is a step size. The update rule (12) can be written locally as
where , , and is the iteration index to find the minimizer.
Regarding the convergence of the update rule (13), we have the following result.
Assume that the network is undirected and connected. Then, the sequence generated by the dual ascent method (13) converges to of CKF problem (2), as goes to infinity, provided that the step size is chosen such that
where is the maximum eigenvalue of . Moreover, the sequence converges to a vector which is uniquely determined by the initial conditions of ’s.
Substituting the dual feasibility equation to the primal feasibility equation of (7) yields
Now let . Then, one obtains
From the identity (15), we have
Here, is a symmetric positive semi-definite matrix which has simple zero eigenvalues, and it holds that . Since is zero, it follows that if is chosen such that , all eigenvalues of , except , are located inside the unit circle. The bound (14) ensures this.
Regarding the convergence of , we proceed as follows. With the orthonormal matrix used in Lemma 1, can be written as
where is a submatrix with the first rows and first columns removed. In the new coordinates , defined by , the error dynamics of the dual variable can be expressed as
From this equation, we know that the first components of , denoted by , remains the same for any , i.e., , , meaning that , which means that . Moreover, with chosen as (14), which guarantees that the matrix has all its eigenvalues except 1 inside the unit circle, we have , from which it follows that
Recalling that , we have from (17)
Applying (for and , see the proof of Lemma 1), we have