Iterated Extended Kalman Smoother-based Variable Splitting for L_1-Regularized State Estimation

03/20/2019 ∙ by Rui Gao, et al. ∙ aalto 0

In this paper, we propose a new framework for solving state estimation problems with an additional sparsity-promoting L_1-regularizer term. We first formulate such problems as minimization of the sum of linear or nonlinear quadratic error terms and an extra regularizer, and then present novel algorithms which solve the linear and nonlinear cases, respectively. The methods are based on a combination of the iterated extended Kalman smoother and variable splitting techniques such as alternating direction method of multipliers (ADMM). We present a general algorithmic framework for variable splitting methods, where the iterative steps involving minimization of the nonlinear quadratic terms can be computed efficiently by iterated smoothing. Due to the use of state estimation algorithms, the proposed framework has a low per-iteration time complexity, which makes it suitable for solving a large-scale or high-dimensional state estimation problem. We also provide convergence results for the proposed algorithms. The experiments show the promising performance and speed-ups provided by the methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

STATE estimation problems naturally arise in many signal processing applications including target tracking, smart grids, and robotics [1, 2, 3]. In conventional Bayesian approaches, the estimation task is cast as a statistical inverse problem for restoring the original time series from imperfect measurements, based on a statistical model for the measurements given the signal together with a statistical model for the signal. In linear Gaussian models, this problem admits a closed-form solution, which can be efficiently implemented by the Kalman (or Rauch–Tung–Striebel) smoother (KS) [4, 2]. For nonlinear Gaussian models we can use various linearization and sigma-point-based methods [2] for approximate inference. In particular, here we use the so-called iterated extended Kalman smoother (IEKS) [5], which is based on analytical linearisation of the nonlinear functions. Although the aforementioned smoothers are often used to estimate dynamic signals, they lack a mechanism to promote sparsity in the signals.

One approach for promoting sparsity is to add an -term to the cost function formulation of the state estimation problem. This approach imposes sparsity on the signal estimate, which is either based on a synthesis sparse or an analysis sparse

signal model. A synthesis sparse model assumes that the signal can be represented as a linear combination of basis vectors, where the coefficients are subject to, for example, an

-penalty, thus promoting sparsity. In the past decade, the use of synthesis sparsity for estimating dynamic signals has drawn a lot of attention [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. For example, a pseudo-measurement technique was used in the Kalman update equations for encouraging sparse solutions [7]. A method based on sparsity was applied compressive sensing to update Kalman innovations or filtering errors in [8]. Based on synthesis sparsity, the estimation problem has been formulated as an -regularized least square problem in [14]. Nevertheless, the previously mentioned methods typically only consider synthesis sparsity of the signal and assume a linear dynamic system.

On the other hand, analysis sparsity, also called cosparsity, assumes that the signal is not sparse itself, but rather the outcome is sparse or compressible in some transform domain, which leads to the flexibility in the modeling of signals [16, 17, 18, 19, 20]. Analysis sparse models involving an analysis operator – a popular choice being total variation (TV) – have been very successful in image processing. For example, several algorithms [18, 19] have been developed to train an analysis operator and the trained operators have been used for image denoising. In [21] the authors proposed to use the TV regularizer to improve the quality of image reconstruction. However, these approaches are not ideally suited for reconstructing dynamic signals. In state estimation problems, the available methods for analysis sparse priors are still limited. The main goal of this paper is to introduce these kinds of methods for dynamic state estimation.

Formulating a state estimation problem using synthesis and analysis sparsity leads to a general class of optimization problems, which require minimization of composite functions such as an analysis--regularized least-squares problems. The difficulties arise from the appearance of the nonsmooth regularizer. There are various batch optimization methods such as proximal gradient method [22], Douglas-Rachford splitting (DRS) [23, 24], Peaceman-Rachford splitting (PRS) [25, 26], the split Bregman method (SBM) [27], the alternating method of multipliers (ADMM) [28, 29]

, and the first-order primal-dual (FOPD) method for addressing this problem. However, these general methods do not take the inherent temporal nature of the optimization problem into account, which leads to bad computational and memory scaling in large-scale or high-dimensional data. This often renders the existing methods intractable due to their extensive memory and computational requirements.

As a consequence, we propose to combine a Kalman smoother with variable splitting optimization methods, which allows us to account for the temporal nature of the data in order to speed up the computations. In this paper, we derive novel methods for efficiently estimating dynamic signals with an extra (analysis) -regularized term. The developed algorithms are based on using computationally efficient KS and IEKS for solving the subproblems arising within the steps of the optimization methods. Our experiments demonstrate promising performance of the methods in practical applications. The main contributions are as follows:

  1. We formulate the state estimation problem as an optimization problem that is based upon a general sparse model containing analysis or synthesis prior. The formulation accommodates a large class of popular sparsifying regularizers (e.g., synthesis -norm, analysis -norm, total variation norm) for state estimation.

  2. We present novel practical optimization methods, KS-ADMM and IEKS-ADMM, which are based on combining ADMM with KS and IEKS, respectively.

  3. We also prove the convergence of the KS-ADMM method as well as the local convergence of the IEKS-ADMM method.

  4. We generalize our smoother-based approaches to a general class of variable splitting techniques.

The advantage of the proposed approach is that the computational cost per iteration is much less than in the conventional batch solutions. Our approach is computationally superior to the state-of-the-art in a large-scale or high-dimensional state estimation applications.

The rest of the paper is organized as follows. We conclude this section by reviewing variable splitting methods and IEKS. Section II first develops the batch optimization by a classical ADMM method, and then presents a new KS-ADMM method for solving a linear dynamic estimation problem. Furthermore, for the nonlinear case, we present an IEKS-ADMM method in Section III and establish its local convergence properties. Section IV introduces a more general smoother-based variable splitting algorithmic framework. In particular, a general IEKS-based optimization method is formulated. Various experimental results in Section V demonstrate the effectiveness and accuracy in simulated linear and nonlinear state estimation problem. The performance of the algorithm is also illustrated in real-world tomographic reconstruction.

I-a Problem Formulation

Consider the dynamic state-space model [1, 2]

(1)

where , denotes an -dimensional state of the system at the time step , and is an -dimensional noisy measurement signal, is a measurement function (typically with ), and is a state transition function at time step . The initial state is assumed to have mean and covariance . The errors and

are assumed to be mutually independent random variables with known positive definite covariance matrices

and , respectively. The goal is to estimate the state sequence from the noisy measurement sequence . In this paper, we focus on estimating by minimizing the sum of quadratic error terms and an sparsity-promoting penalty.

For the sparsity assumption, we add an extra -penalty for the state , and then formulate the optimization problem as

(2)

where is the optimal state sequence, is a linear operator, and is a penalty parameter. The formulation (2) encompasses two particular cases: by setting

to a diagonal matrix (e.g., identity matrix

), a synthesis sparse model is obtained, which assumes that are sparse. Such a case arises frequently in state estimation applications [15, 10, 11, 30, 12]. Correspondingly, an analysis sparse model is obtained when a more general is used. For example, total variation (TV) regularization, which is common in tomographic reconstruction, can be obtained by using a finite-difference matrix as .

More generally, can be a fixed matrix [31, 20, 16] or a learned matrix [17, 18, 19]. It should be noted that, if the term is not used (i.e., when ) in (2), the objective can be solved by using a linear or non-linear Kalman)smoother [4, 5, 2]. However, when , the smoothing is no longer applicable, and the cost function is non-differentiable.

Since does not have a closed-form proximal operator in general, we employ variable splitting technique for solving the resulting optimization problem. As mentioned above, many variable splitting methods can be used to solve (2), such as PRS [26], split SBM [27], ADMM, DRS [23], and FOPD [32]. Especially, ADMM is a popular member of this class. Therefore, we start by presenting algorithms based on ADMM and then extend them to more general variable splitting methods. In the following, we review variable splitting and IEKS methods, before presenting our approach in detail.

I-B Variable Splitting

The methods we develop in this paper are based on variable splitting [33, 34]. Consider an unconstrained optimization problem in which the objective function is the sum of two functions

(3)

where , and is a matrix. Variable splitting refers to the process of introducing an auxiliary constrained variable to separate the components in the cost function. More specifically, we impose the constraint , which transforms the original minimization problem (3) into an equivalent constrained minimization problem given by

(4)

The minimization problem (4) can be solved efficiently by classical constrained optimization methods [35]. The rationale of variable splitting is that it may be easier to solve the constrained problem (4) than the unconstrained one (3). PRS, SBM, FOPD, ADMM, and their variants [36] are a few well-known variable splitting methods – see also [37, 38] for a recent historical overview.

ADMM [28] is one of the most popular algorithms for solving (4). ADMM defines an augmented Lagrangian function, and then alternates between the updates of the split variables. Given , , and , its iterative steps are:

(5a)
(5b)
(5c)

where is a Lagrange multiplier and is a parameter.

The PRS method [25, 26] is similar to ADMM except that it updates the Lagrange multiplier twice. The typical iterative steps for (3) are

(6a)
(6b)
(6c)
(6d)

where .

In SBM [27], we iterate the steps

(7a)
(7b)

times, and update the extra variable by

(8)

When , this is equivalent to ADMM.

There are also other variable splitting methods which alternate proximal steps for the primal and dual variables. One example is FOPD [32], where the th iteration consists of the following subproblems

(9a)
(9b)
(9c)

where , and are parameters.

All these variable splitting algorithms provide simple ways to construct efficient iterative algorithms that offer simpler inner subproblems. However, the subproblems such as (5), (6), (6) and (9a) remain computationally expensive, as they involve large matrix-vector products when the dimensionality of is large. We circumvent this problem by combining variable splitting with KS and IEKS.

I-C The Iterated Extended Kalman Smoother

IEKS [5] is an approximative algorithm for solving non-linear optimal smoothing problems. However, it can also be seen [5] as an efficient implementation of the Gauss–Newton algorithm for solving the problem

(10)

That is, it produces the maximum a posteriori (MAP) estimate of the trajectory. The IEKS method works by alternating between linearisation of and around a previous estimate , as follows:

(11a)
(11b)

and solving the linearized problem

(12)

where is the Jacobian of . The solution of (12) can in turn be efficiently obtained by the Rauch–Tung–Striebel (RTS) smoother [4], which first computes the filtering mean and covariances and , by alternating between prediction

(13a)
(13b)

and update

(14a)
(14b)
(14c)
(14d)

where and are the innovation covariance matrix and the Kalman gain at the time step , respectively. The filtering means and covariances are then corrected in a backwards (smoothing) pass

(15a)
(15b)
(15c)

Now setting gives the solution to (12). When the functions and are linear, the above iteration converges in a single step. This algorithm is the classical RTS smoother or more briefly KS [4].

In this paper, we use the KS and IEKS algorithms as efficient methods for solving generalized versions of the optimization problems given in (10), which arise within the steps of variable splitting.

Ii Linear State Estimation by KS-ADMM

In this section, we present the KS-ADMM algorithm which is a novel algorithm for solving -regularized linear Gaussian state estimation problems. In particular, Section II-A describes the batch solution by ADMM. Then, by defining an artificial measurement noise and a pseudo-measurement, we formulate the KS-ADMM algorithm to solve the primal variable update in Section II-B.

Ii-a Batch Optimization

Let us assume that the state transition function and the measurement function are linear, denoted by

(16)

where and are the transition matrix and the measurement matrix, respectively. In order to reduce this problem to (3), we stack the entire state sequence into a vector, which transforms the objective into a batch optimization problem. Thus, we define the following variables

(17a)
(17b)
(17c)
(17d)
(17e)
(17f)
(17g)
(17h)

The optimization problem introduced in Section I-A can now be reformulated as the following batch optimization problem

(18)

which in turn can be seen to be a special case of (3). Here, our algorithm for solving (18) builds upon the batch ADMM [28].

To derive an ADMM algorithm for (18), we introduce an auxiliary variable and a linear equality constraint . The resulting equality-constrained problem is formulated mathematically as

(19)

The main objective here is to find a stationary point of the augmented Lagrangian function associated with (19) as the function

(20)

where is the dual variable and is a penalty parameter. As described in Section I-B, at each iteration of ADMM we perform the updates

(21a)
(21b)
(21c)

The update for the primal sequence is equivalent to the quadratic optimization problem given by

(22)

which has the closed-form solution

(23)

For the dual sequence , the iteration in (21b) can be solved by [39]

(24)

where .

While the optimization problem (22) can be solved in closed-form, direct solution is computationally demanding, especially when the number of time points or the dimensionality of the state is large. However, the problem can be recognized to be a special case of optimization problems where the iterations can be solved by KS (see Section I-C) provided that we add pseudo-measurements to the problem. In the following, we present the resulting algorithm.

Ii-B The KS-ADMM Solver

The proposed KS-ADMM solver is described in Algorithm 1. To extend the batch ADMM to KS-ADMM, we first define an artificial measurement noise and a pseudo-measurement , and then rewrite (22) as

(25)

The solution to (25) can then be computed by running KS on the state estimation problem

(26a)
(26b)
(26c)

Here, is an independent random variable with covariance . The KS-based solution can be described as a four stage recursive process: prediction, -update, -update, and the RTS smoother which should be performed for . First, the prediction step is given by

(27a)
(27b)

where and are the predicted mean and covariance at the time . Secondly, the update steps for are given by

(28a)
(28b)
(28c)
(28d)

Thirdly, the update steps for are

(29a)
(29b)
(29c)
(29d)

Here, and , and , and , and are the innovation covariances, gain matrices, means, and covariances for the variables and at the time step , respectively. Finally, we run a RTS smoother [4] for = , which has the steps

(30a)
(30b)
(30c)

where and (see [2] for more details). This gives the update for as:

(31)

The remaining updates for are given as

(32)

where , and

(33)
Input: , , , , , , ; parameters and ; and .
Output:
1 while not convergent do
2   

    run the Kalman filter using (

27), (28), and (29);
3       run the Kalman smoother by using (30);
4       compute by (31);
5       compute by (32);
6       compute by (33);
7      
8 end while
Algorithm 1 KS-ADMM

It is useful to note that in Algorithm 1, the covariances and gains are independent of the iteration number and thus can be pre-computed outside the ADMM iterations. Furthermore, when the model is time-independent, we can often use stationary Kalman filters and smoothers instead of their general counterparts which can further be used to speed up the computations.

Ii-C Convergence of KS-ADMM

In this section, we discuss the convergence of KS-ADMM. If the system (26) is detectable [40], then the objective function (20) is convex. The traditional convergence results for ADMM such as in [28, 41] then ensure that the objective globally converges to the stationary (optimal) point . The result is given in the following.

Theorem 1 (Convergence of KS-ADMM).

Assume that the system (26) is detectable [40]. Then, for a constant , the sequence generated by Algorithm 1 from any starting point converges to the stationary point of (20).

Proof.

Due to the detectability assumption, the objective function is convex, and thus the result follows from the classical ADMM convergence proof [28, 41].   

Iii Nonlinear State Estimation by IEKS-ADMM

When and are nonlinear, the subproblem arising in the ADMM iteration cannot be solved in closed form. In the following, we first present a batch solution of the nonlinear case based on a Gauss–Newton (GN) iteration and then show how it can be efficiently implemented using IEKS.

Iii-a Batch Optimization

Let us now consider the case where the state transition function and the measurement function in (1) are nonlinear. We now proceed to rewrite the optimization (2) in batch form by defining the following variables

(34a)
(34b)

Note that the variables , , , , and have the same definitions as (17). Using these variables, the subproblem can be naturally transformed into

(35)

which is also a special case of (3), similarly to the linear case.

Following the ADMM, we define the augmented Lagrangian function associated with (35) as:

(36)

Since the nonlinear batch solution is based on ADMM, the iteration steps of and are the same with the linear case (see Equations (24) and (21c)). Here, we focus on introducing the solution of the primal variable .

When updating , the objective is no longer a quadratic function. However, the optimization problem can be solved with GN [42]. Here, the subproblem is rewritten as

(37)

where

Then, the gradient of is given by

(38)

where

and the Hessian is , where

In GN, avoiding the trouble of computing the residual , we use the approximation to replace , which means is assumed to be small enough. Thus, the primal variable in iteration is updated by:

(40)

The iterations can stopped after a maximum number of iterations or if the condition is satisfied, where is an error tolerance. If is small enough, then it means that the above algorithm has (almost) converged. The rest of the ADMM updates can be implemented similarly to the linear Gaussian case.

Iii-B The IEKS-ADMM Solver

We now move on to consider the IEKS-ADMM solver. As discussed in Section I-C, IEKS can be seen as an efficient implementation of the GN method, which inspires us to derive an efficient implementation of the batch ADMM.

Now, we rewrite the subproblem (37) as

(41)

In a modest scale (e.g., ), can be directly computed by (40) although its computations scale as . When is large, the batch ADMM will have high memory and computational requirements. In this case, the use of IEKS becomes beneficial due to its linear computational scaling. In this paper, the proposed method incorporates IEKS into ADMM to design the IEKS-ADMM algorithm for solving the nonlinear case.

In the IEKS algorithm, the Gaussian smoother is run several times with and and their Jacobians are evaluated at the previous (inner loop) iteration. The detailed iteration steps of IEKS-ADMM are described in Algorithm 2. In particular, following the prediction steps (13) in Section I-C, the update steps for are given by

(42a)
(42b)
(42c)
(42d)

and for the pseudo-measurement , the update steps are the same as in the linear case. They are given in (29).

Additionally, the RTS smoother steps are also described in Section I-C. We can then obtain the solution as . Note that the updates on and can be implemented by (32) and (33), respectively.

Input: , , , , , , ; and ;