# Distributed Kalman-filtering: Distributed optimization viewpoint

We consider the Kalman-filtering problem with multiple sensors which are connected through a communication network. If all measurements are delivered to one place called fusion center and processed together, we call the process centralized Kalman-filtering (CKF). When there is no fusion center, each sensor can also solve the problem by using local measurements and exchanging information with its neighboring sensors, which is called distributed Kalman-filtering (DKF). Noting that CKF problem is a maximum likelihood estimation problem, which is a quadratic optimization problem, we reformulate DKF problem as a consensus optimization problem, resulting in that DKF problem can be solved by many existing distributed optimization algorithms. A new DKF algorithm employing the distributed dual ascent method is provided and its performance is evaluated through numerical experiments.

## Authors

• 1 publication
• 1 publication
• ### Kalman Filtering With Censored Measurements

This paper concerns Kalman filtering when the measurements of the proces...
02/20/2020 ∙ by Kostas Loumponias, et al. ∙ 0

• ### Distributed Recursive Filtering for Spatially Interconnected Systems with Randomly Occurred Missing Measurements

This paper proposed a distributed filter for spatially interconnected sy...
11/10/2019 ∙ by Bai Li, et al. ∙ 0

• ### Nonlinear Kalman Filtering with Divergence Minimization

We consider the nonlinear Kalman filtering problem using Kullback-Leible...
05/01/2017 ∙ by San Gultekin, et al. ∙ 0

• ### Optimal Intermittent Measurements for Tumor Tracking in X-ray Guided Radiotherapy

In radiation therapy, tumor tracking is a challenging task that allows a...
03/20/2019 ∙ by Antoine Aspeel, et al. ∙ 0

• ### Uniform ε-Stability of Distributed Nonlinear Filtering over DNAs: Gaussian-Finite HMMs

In this work, we study stability of distributed filtering of Markov chai...
02/16/2016 ∙ by Dionysios S. Kalogerias, et al. ∙ 0

• ### Distributed Active State Estimation with User-Specified Accuracy

In this paper, we address the problem of controlling a network of mobile...
06/06/2017 ∙ by Charles Freundlich, et al. ∙ 0

• ### Multi-frequency calibration for DOA estimation with distributed sensors

In this work, we investigate direction finding in the presence of sensor...
02/24/2020 ∙ by Martin Brossard, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

It goes without saying that the Kalman-filter, an optimal state estimator for dynamic systems, has had a huge impact on various fields such as engineering, science, economics, etc. [1, 2, 3, 4]. Basically, the filter predicts the expectation of the system state and its covariance based on the dynamic model and the statistical information on the model uncertainty or process noise, and then correct them using new measurement, sensor model, and the information on measurement noise. When multiple sensors possibly different types are available, we can just combine the sensor models to process the measurements altogether.

Thanks to the rapid development of sensor devices and communication technology, we are now able to monitor large scale systems or environments such as traffic network, plants, forest, sea, etc. In those systems, sensors are geometrically distributed, may have different types, and usually not synchronized. To process the measurements, the basic idea would be to deliver all the data to one place, usually called fusion center, and do the correction step as in the case of multiple sensors. This is called the centralized Kalman-filtering (CKF). As expected, CKF requires a powerful computing device to handle a large number of measurements and sensor models, is exposed to a single point of failure, and is difficult to scale up. In order to overcome these drawbacks, researchers developed the distributed Kalam-filtering (DKF) in which each sensor in the network solves the problem by using local measurements and communicating with its neighbors. Compared with CKF, DKF has advantageous in terms of the scalability, robustness to component loss, computational cost, and thus the literature on this topic is expanding rapidly

[5, 6, 7, 8, 9, 10, 11, 12]. For more details on DKF, see the survey [13] and references therein.

Some relevant results are summarized as follows. In [5], the author proposed scalable distributed Kalman-Bucy filtering algorithms in which each node only communicates with its neighbors. An algorithm with average consensus filters using the internal models of signals being exchanged is proposed in [7]. It is noted that the algorithm works in a single-time scale. In the work [11], the authors proposed a continuous-time algorithm that makes each norm of all local error covariance matrices be bounded, thus overcomes a major drawback of [5]. In [10], an algorithm with a high gain coupling term in the error covariance matrix is introduced and it is shown that the local error covariance matrix approximately converges to that of the steady-state centralized Kalman-filter. An in-depth discussion on distributed Kalman-filtering problem has been provided in [14, 15], and the algorithms that exchange the measurements themselves, or exchange certain signals instead of the measurements are proposed, respectively.

Although each of the existing algorithms has own novel ideas and advantages, to the best of the authors’ knowledge, we do not have a unified viewpoint for DKF problem. Motivated by this, it is the aim of this paper to provide a framework for the problem from the perspective of distributed optimization.

We start by observing that the correction step of Kalman-filtering is basically an optimization problem [2, 3, 4], and then formulate DKF problem as a consensus optimization problem, which provides a fresh look at the problem. This results in that DKF problem can be solved by many existing distributed optimization algorithms [16, 17, 18, 19, 20], expecting various DKF algorithms to be derived. As an instance, a new DKF algorithm employing the dual ascent method [20], one of the basic algorithms for distributed optimization problems, is provided in this paper.

This paper is organized as follows. In Section II, we recall CKF problem from the optimization perspective, and connects DKF problem to a distributed optimization problem. A new DKF algorithm based on dual ascent method is proposed in Section III, and numerical experiments evaluating the proposed algorithm is conducted in Section IV.

Notation: For matrices , …, , denotes the block diagonal matrix composed of to . For scalars ,…, , , and with matrices ’s is defined similarly.

denotes the vector whose components are all 1, and

is the identity matrix whose dimension is

. The maximum and minimum eigenvalue of a matrix

are denoted by and

, respectively. For a random variable

, denotes

is normally distributed with the mean

and the variance

, and denotes the expected value of a random variable , i.e., . The half vectorization of a symmetric matrix is denoted by , whose elements are filled in Column-major order. where is element of , and denotes the inverse function of , . For a function , denotes the gradient vector .

Graph theory: For a network consisting of nodes, the communication among nodes is modeled by a graph . Let be an adjacency matrix associated to where is a weight of an edge between nodes and . If node communicates to node then, , or if not . Assume there is not self edge, i.e., . The Laplacian matrix associated to the graph , denoted by is a matrix such that , and . is a set of nodes communicating with node , i.e., .

## Ii Distributed Kalman-filtering and Its Connection to Consensus Optimization

In this section, we recall CKF problem in terms of optimization, which is the maximum likelihood estimation[2], and establish a connection between DFK and distributed optimization.

Consider a discrete-time linear system with sensors described by

 xk+1 =Fxk+wk (1a) yk =Hxk+vk=⎡⎢ ⎢ ⎢ ⎢⎣H1H2⋮HN⎤⎥ ⎥ ⎥ ⎥⎦xk+⎡⎢ ⎢ ⎢ ⎢ ⎢⎣v1,kv2,k⋮vN,k⎤⎥ ⎥ ⎥ ⎥ ⎥⎦ (1b)

where is the state vector of the dynamic system, is the output vector, and is the output associated to sensor . ’s satisfy . is the system matrix and is the output matrix consisting of which is the output matrix associated to sensor . with is the process noise, is the measurement noise on sensor , and with . Assume that the pair is observable, and each is uncorrelated to for .

### Ii-a Centralized Kalman-filtering problem from the optimization perspective

If all the measurements from sensors are collected and processed altogether, the problem can be seen as the one with a imaginary sensor that measures with complete knowledge on , thus called centralized Kalman-filtering.The filtering consists of two steps, prediction and correction. In the prediction step, the predicted estimate and error covariance matrix are obtained based on the previous estimate, error covariance matrix, and the system dynamics. The update rules are given by

 ^xk|k−1 =F^xk−1 Pk|k−1 =E{ek|k−1e⊤k|k−1} =FE{ek−1e⊤k−1}F⊤+E{wkw⊤k} =FPk−1F⊤+Q

where and are estimate and error covariance matrix in previous time, respectively, and , . Assume that is initialized as a positive definite matrix (, usually set as ).

In the correction step, the predicted estimate and the error covariance matrix are updated based on the current measurements containing the measurement noise. The correction step can be regarded as a process to find the optimal parameter (estimate) from the predicted estimate , error covariance , and the observation . In fact, it is known that this step is an optimization problem (maximum likelihood estimation, MLE[2]) and we recall the details below.

Let and . Then, where . For the random variable , the likelihood function is given by

 L(ξc)=1√(2π)(m+n)|Sk|e−12(zk−¯Hcξc)⊤S−1k(zk−¯Hcξc)

where the right-hand side is nothing but the probability density function of

with the free variable .

Now, the maximum likelihood estimate is defined as

 ^xk:=argmaxξc(L(ξc)).

Since is a monotonically decreasing function with respect to , can also be obtained by

 ^xk=∗argminξc(fc(ξc))=^xk|k−1+K(yk−H^xk|k−1) (2)

where . With the matrix inversion lemma, the Kalman-gain can be written as , which appears in the standard Kalman-filtering.

On the other hand, by the definition of , the update rule of the error covariance matrix of CKF is given by

 Pk =(¯H⊤cS−1¯Hc)−1=(H⊤R−1H+P−1k|k−1)−1 (3) =Pk|k−1−(H⊤R−1H+P−1k|k−1)−1H⊤R−1HPk|k−1.

For more details, see [4, 2, 3].

### Ii-B Derivation of distributed Kalman-filtering problem

Now, we consider a sensor network which consists of sensors and suppose that each sensor runs an estimator without the fusion center. Each estimator in the network tries to find the optimal estimate by processing the local measurement and exchanging information with its neighbors through communication network. The communication network among estimators is modeled by a graph and the Laplacian matrix associated with is denoted by . Under the setting (1), estimator measures only the local measurement , and the parameters and are kept private to estimator . It is noted that the pair is not necessarily observable. We assume that the graph is connected and undirected i.e., , and and are open to all estimators.

Similar to CKF, DKF has two steps, local prediction and distributed correction. In the local prediction step, each estimator predicts

 ^xi,k|k−1 =F^xi,k−1 Pi,k|k−1 =FPi,k−1F⊤+Q.

where and are local estimates of and , respectively, that estimator holds.

In the distributed correction step, each estimator solves the maximum likelihood estimation in a distributed manner. The objective function of CKF can be rewritten as

 N∑i=1fi(ξc)=N∑i=112(¯zi,k−¯Hiξc)⊤¯S−1i,k(¯zi,k−¯Hiξc)

where , , . We assume that and . This makes sense when the each sensor reached a consensus on and in the previous correction step.

Assuming that each estimator holds its own optimization variable for , DKF problem is written as the following consensus optimization problem.

 minimize N∑i=1fi(ξi) (4a) subject to ξ1=⋯=ξN. (4b)

If there exists a distributed algorithm that finds a minimizer , we say that the algorithm solves DKF problem.

Since the kernel of Laplacian is , the constraints (4b) can be written with as where . To proceed, we define the Lagrangian to solve the problem (4) as

 L(ξ,λ) =N∑i=1fi(ξi)+λ⊤¯Lξ (5)

where is the Lagrange multipliers (dual variable) associated with (4b) and . We decompose the Lagrangian into local ones defined by

 Li(ξi,λi) =fi(ξi)+λ⊤i∑j∈Niaij(ξi−ξj). (6)

For the Lagrangian (5), the partial derivatives over and are given by

 ∇ξL(ξ,λ) =−¯H⊤¯S−1k(¯zk−¯Hξ)+¯Lλ ∇λL(ξ,λ) =¯Lξ,

where , and . Then, the optimality condition for (, ) becomes the following saddle point equation (KKT conditions), namely

 [−¯H⊤¯S−1k¯H−¯L¯L0][ξ∗λ∗]=[−¯H⊤¯S−1k¯zk0] (7)

where and .

###### Lemma 1

The solutions to DKF problem are parameterized as where and are unique vectors and is an arbitrary vector. If is an optimal solution to DKF problem, then is the optimal solution to CFK problem.

By multiplying to the dual feasibility equation in (7), one can obtain

 (1⊤N⊗In)¯H⊤¯S−1k¯Hξ∗=(1⊤N⊗In)¯H⊤¯S−1k¯zk. (8)

The primal feasibility equation in (7) implies that , hence (8) becomes

 (1⊤N⊗In)¯H⊤¯S−1k¯H(1N⊗In)ξ†=(1⊤N⊗In)¯H⊤¯S−1k¯zk.

From , one has

 (P−1k|k−1 +N∑i=1H⊤iR−1iHi)ξ† =P−1k|k−1^xk|k−1+N∑i=1H⊤iR−1iyi,k.

Since and , it follows that

 ξ†=^xk|k−1+Kk(yk−H^xk|k−1) (9)

where and by the matrix inversion lemma, we have . From the fact that the right-hand side of above equation is the same with the update rule (2) of CKF, it follows that is the optimal estimate of CKF .

On the other hand, one can observe that the optimal dual variable is not unique since the dual feasibility equation

 (L⊗In)λ∗=¯H⊤¯S−1k(¯zk−¯H(1N⊗In)ξ†) (10)

is singular. To find , consider the orthonormal matrix such that where ,

consists of the eigenvectors associated with the non-zero eigenvalues of

, denoted by , and . Left multiplying to the equation (10) yields

where and . Hence, the optimal dual variable becomes where is an arbitrary vector. This completes the proof.

### Ii-C Information form of DKF problem

It is well known that the dual of the Kalman-filter is the Information filter which uses the canonical parameterization

to represent the normal (Gaussian) distribution

[4]. With the canonical parameterization, DKF problem (4) can also be written in information form.

Let , and which are the local decision variable for the information vector of the estimator , the locally predicted information matrix and information vector, respectively. With these transformations, we rewrite the problem (4) as

 minimize N∑i=1hi(ηi) (11a) subject to η1=⋯=ηN (11b)

where

 hi(ηi)=12 (η⊤iΦ−1iηi−η⊤iΦ−1i(H⊤iR−1iyi+1Nτi,k|k−1) +y⊤iR−1iyi+1Nτ⊤i,k|k−1Ω−1i,k|k−1τi,k|k−1)

and . For the distributed problem (11), the Lagrangian is given by

 Lη(η,λ) =N∑i=1hi(ηi)+ν⊤¯Lη

where and is the Lagrange multipliers. The associated saddle point equation becomes

 [−(¯H⊤~S−1k¯H)−1−¯L¯L0][η∗ν∗]=[−¯H⊤~S−1k~zk0]

where , , and .

### Ii-D Interpretations of existing DKF algorithm from the optimization perspective

One of the recent DKF algorithms, Consensus on Information (CI) [14, 15] can be interpreted in the provided framework. CI consists of three steps, prediction, local correction, and consensus. In the prediction step, each estimator predicts the estimate based on the system dynamics and previous estimate similar to the standard information filter algorithm. Each estimator also updates the estimate with local measurements and output matrix in the local correction step. After that, the estimators find the agreed estimate by averaging the local estimates in the consensus step.

In the provided framework, CI can be viewed as the algorithm which solves the problem (11) through the two steps, the local correction step and the consensus step. In the former step, each of estimators finds the local minimizer (estimate) of the local objective function . Since the partial derivative of becomes

 ∇ηihi(ηi)=Φ−1iηi−Φ−1i(H⊤iR−1iyi+1Nτi,k|k−1)

and the local minimizer can be obtained by , which is the local update rule of CI111In the CI, the scalar is neglected [14].. The local minimizer, however, can be different among estimators, since it minimizes only the local objective function , which violates the constraint (11b).

The consensus step of CI performs a role to find an agreed (average) value of the local estimates, using the doubly stochastic matrix, and the results of the consensus step satisfy the constraint (

11b). The agreed estimate, however, may not be the global minimizer of (11), which means that the consensus step cannot guarantee the convergence of the estimates to that of CKF.

## Iii A Solution to DKF Problem

One can observe that (5) is strictly convex, differentiable, and the local objective function is a quadratic function, hence strong duality holds. In addition, from the fact is a nonsingular and block diagonal matrix, the optimal conditions (7) are already in a distributed form. This implies that the minimizer can be obtained in a distributed manner as long as is given, i.e., .

Based on the above discussion, we see that one possible algorithm solving (4), guaranteeing the asymptotic convergence to the global minimizer , is the dual ascent method [16, 20] which is given by

 ξl+1 =(¯H⊤¯S−1k¯H)−1(¯H⊤¯S−1k¯zk−¯Lλl) (12a) λl+1 =λl+αλ¯Lξl+1 (12b)

where is a step size. The update rule (12) can be written locally as

 ξi,l+1 =^xi,k|k−1+Ki,k(yi,k−Hi^xi,k|k−1)−ψi,l (13a) λi,l+1 =λi,l+αλ∑j∈Niaij(ξi,l+1−ξj,l+1). (13b)

where , , and is the iteration index to find the minimizer.

Regarding the convergence of the update rule (13), we have the following result.

###### Lemma 2

Assume that the network is undirected and connected. Then, the sequence generated by the dual ascent method (13) converges to of CKF problem (2), as goes to infinity, provided that the step size is chosen such that

 αλ<2σ2Nmaxi{∥(¯H⊤iS−1i,k¯Hi)−1∥} (14)

where is the maximum eigenvalue of . Moreover, the sequence converges to a vector which is uniquely determined by the initial conditions of ’s.

Substituting the dual feasibility equation to the primal feasibility equation of (7) yields

 ¯L(¯H⊤¯S−1k¯H)−1¯Lλ∗=¯L(¯H⊤¯S−1k¯H)−1¯H⊤¯S−1k¯zk. (15)

Now let . Then, one obtains

 eλl+1 =λl+αλ¯Lξl+1−λ∗ =λl+αλ¯L(¯H⊤¯S−1k¯H)−1(¯H⊤¯S−1k¯zk−¯Lλl)−λ∗.

From the identity (15), we have

 eλl+1=(I−αλ¯L(¯H⊤¯S−1k¯H)−1¯L)eλl:=(I−αλ~Aλ)eλl. (16)

Here, is a symmetric positive semi-definite matrix which has simple zero eigenvalues, and it holds that . Since is zero, it follows that if is chosen such that , all eigenvalues of , except , are located inside the unit circle. The bound (14) ensures this.

Regarding the convergence of , we proceed as follows. With the orthonormal matrix used in Lemma 1, can be written as

 ~Aλ =(UΛU⊤⊗In)(¯H⊤¯S−1k¯H)−1(UΛU⊗In) =(U⊗In)diag(0n,Msub)(U⊤⊗In)

where is a submatrix with the first rows and first columns removed. In the new coordinates , defined by , the error dynamics of the dual variable can be expressed as

 ¯eλl+1 =diag(I,I−αλMsub)¯eλl.

From this equation, we know that the first components of , denoted by , remains the same for any , i.e., , , meaning that , which means that . Moreover, with chosen as (14), which guarantees that the matrix has all its eigenvalues except 1 inside the unit circle, we have , from which it follows that

 liml→∞eλl=(U⊗In)[~eλ0;0]=(U1⊗In)(U⊤1⊗In)eλ0. (17)

Recalling that , we have from (17)

 liml→∞λl =λ∗+(U1U⊤1⊗In)(λ0−λ∗).

Applying (for and , see the proof of Lemma 1), we have

 liml→∞λl =(¯U¯Λ−1¯U⊤⊗In)b+(1N⊗In)avg(λi,0)

where