I Introduction
STATE estimation problems naturally arise in many signal processing applications including target tracking, smart grids, and robotics [1, 2, 3]. In conventional Bayesian approaches, the estimation task is cast as a statistical inverse problem for restoring the original time series from imperfect measurements, based on a statistical model for the measurements given the signal together with a statistical model for the signal. In linear Gaussian models, this problem admits a closedform solution, which can be efficiently implemented by the Kalman (or Rauch–Tung–Striebel) smoother (KS) [4, 2]. For nonlinear Gaussian models we can use various linearization and sigmapointbased methods [2] for approximate inference. In particular, here we use the socalled iterated extended Kalman smoother (IEKS) [5], which is based on analytical linearisation of the nonlinear functions. Although the aforementioned smoothers are often used to estimate dynamic signals, they lack a mechanism to promote sparsity in the signals.
One approach for promoting sparsity is to add an term to the cost function formulation of the state estimation problem. This approach imposes sparsity on the signal estimate, which is either based on a synthesis sparse or an analysis sparse
signal model. A synthesis sparse model assumes that the signal can be represented as a linear combination of basis vectors, where the coefficients are subject to, for example, an
penalty, thus promoting sparsity. In the past decade, the use of synthesis sparsity for estimating dynamic signals has drawn a lot of attention [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. For example, a pseudomeasurement technique was used in the Kalman update equations for encouraging sparse solutions [7]. A method based on sparsity was applied compressive sensing to update Kalman innovations or filtering errors in [8]. Based on synthesis sparsity, the estimation problem has been formulated as an regularized least square problem in [14]. Nevertheless, the previously mentioned methods typically only consider synthesis sparsity of the signal and assume a linear dynamic system.On the other hand, analysis sparsity, also called cosparsity, assumes that the signal is not sparse itself, but rather the outcome is sparse or compressible in some transform domain, which leads to the flexibility in the modeling of signals [16, 17, 18, 19, 20]. Analysis sparse models involving an analysis operator – a popular choice being total variation (TV) – have been very successful in image processing. For example, several algorithms [18, 19] have been developed to train an analysis operator and the trained operators have been used for image denoising. In [21] the authors proposed to use the TV regularizer to improve the quality of image reconstruction. However, these approaches are not ideally suited for reconstructing dynamic signals. In state estimation problems, the available methods for analysis sparse priors are still limited. The main goal of this paper is to introduce these kinds of methods for dynamic state estimation.
Formulating a state estimation problem using synthesis and analysis sparsity leads to a general class of optimization problems, which require minimization of composite functions such as an analysisregularized leastsquares problems. The difficulties arise from the appearance of the nonsmooth regularizer. There are various batch optimization methods such as proximal gradient method [22], DouglasRachford splitting (DRS) [23, 24], PeacemanRachford splitting (PRS) [25, 26], the split Bregman method (SBM) [27], the alternating method of multipliers (ADMM) [28, 29]
, and the firstorder primaldual (FOPD) method for addressing this problem. However, these general methods do not take the inherent temporal nature of the optimization problem into account, which leads to bad computational and memory scaling in largescale or highdimensional data. This often renders the existing methods intractable due to their extensive memory and computational requirements.
As a consequence, we propose to combine a Kalman smoother with variable splitting optimization methods, which allows us to account for the temporal nature of the data in order to speed up the computations. In this paper, we derive novel methods for efficiently estimating dynamic signals with an extra (analysis) regularized term. The developed algorithms are based on using computationally efficient KS and IEKS for solving the subproblems arising within the steps of the optimization methods. Our experiments demonstrate promising performance of the methods in practical applications. The main contributions are as follows:

We formulate the state estimation problem as an optimization problem that is based upon a general sparse model containing analysis or synthesis prior. The formulation accommodates a large class of popular sparsifying regularizers (e.g., synthesis norm, analysis norm, total variation norm) for state estimation.

We present novel practical optimization methods, KSADMM and IEKSADMM, which are based on combining ADMM with KS and IEKS, respectively.

We also prove the convergence of the KSADMM method as well as the local convergence of the IEKSADMM method.

We generalize our smootherbased approaches to a general class of variable splitting techniques.
The advantage of the proposed approach is that the computational cost per iteration is much less than in the conventional batch solutions. Our approach is computationally superior to the stateoftheart in a largescale or highdimensional state estimation applications.
The rest of the paper is organized as follows. We conclude this section by reviewing variable splitting methods and IEKS. Section II first develops the batch optimization by a classical ADMM method, and then presents a new KSADMM method for solving a linear dynamic estimation problem. Furthermore, for the nonlinear case, we present an IEKSADMM method in Section III and establish its local convergence properties. Section IV introduces a more general smootherbased variable splitting algorithmic framework. In particular, a general IEKSbased optimization method is formulated. Various experimental results in Section V demonstrate the effectiveness and accuracy in simulated linear and nonlinear state estimation problem. The performance of the algorithm is also illustrated in realworld tomographic reconstruction.
Ia Problem Formulation
Consider the dynamic statespace model [1, 2]
(1) 
where , denotes an dimensional state of the system at the time step , and is an dimensional noisy measurement signal, is a measurement function (typically with ), and is a state transition function at time step . The initial state is assumed to have mean and covariance . The errors and
are assumed to be mutually independent random variables with known positive definite covariance matrices
and , respectively. The goal is to estimate the state sequence from the noisy measurement sequence . In this paper, we focus on estimating by minimizing the sum of quadratic error terms and an sparsitypromoting penalty.For the sparsity assumption, we add an extra penalty for the state , and then formulate the optimization problem as
(2) 
where is the optimal state sequence, is a linear operator, and is a penalty parameter. The formulation (2) encompasses two particular cases: by setting
to a diagonal matrix (e.g., identity matrix
), a synthesis sparse model is obtained, which assumes that are sparse. Such a case arises frequently in state estimation applications [15, 10, 11, 30, 12]. Correspondingly, an analysis sparse model is obtained when a more general is used. For example, total variation (TV) regularization, which is common in tomographic reconstruction, can be obtained by using a finitedifference matrix as .More generally, can be a fixed matrix [31, 20, 16] or a learned matrix [17, 18, 19]. It should be noted that, if the term is not used (i.e., when ) in (2), the objective can be solved by using a linear or nonlinear Kalman)smoother [4, 5, 2]. However, when , the smoothing is no longer applicable, and the cost function is nondifferentiable.
Since does not have a closedform proximal operator in general, we employ variable splitting technique for solving the resulting optimization problem. As mentioned above, many variable splitting methods can be used to solve (2), such as PRS [26], split SBM [27], ADMM, DRS [23], and FOPD [32]. Especially, ADMM is a popular member of this class. Therefore, we start by presenting algorithms based on ADMM and then extend them to more general variable splitting methods. In the following, we review variable splitting and IEKS methods, before presenting our approach in detail.
IB Variable Splitting
The methods we develop in this paper are based on variable splitting [33, 34]. Consider an unconstrained optimization problem in which the objective function is the sum of two functions
(3) 
where , and is a matrix. Variable splitting refers to the process of introducing an auxiliary constrained variable to separate the components in the cost function. More specifically, we impose the constraint , which transforms the original minimization problem (3) into an equivalent constrained minimization problem given by
(4) 
The minimization problem (4) can be solved efficiently by classical constrained optimization methods [35]. The rationale of variable splitting is that it may be easier to solve the constrained problem (4) than the unconstrained one (3). PRS, SBM, FOPD, ADMM, and their variants [36] are a few wellknown variable splitting methods – see also [37, 38] for a recent historical overview.
ADMM [28] is one of the most popular algorithms for solving (4). ADMM defines an augmented Lagrangian function, and then alternates between the updates of the split variables. Given , , and , its iterative steps are:
(5a)  
(5b)  
(5c) 
where is a Lagrange multiplier and is a parameter.
The PRS method [25, 26] is similar to ADMM except that it updates the Lagrange multiplier twice. The typical iterative steps for (3) are
(6a)  
(6b)  
(6c)  
(6d) 
where .
In SBM [27], we iterate the steps
(7a)  
(7b) 
times, and update the extra variable by
(8) 
When , this is equivalent to ADMM.
There are also other variable splitting methods which alternate proximal steps for the primal and dual variables. One example is FOPD [32], where the th iteration consists of the following subproblems
(9a)  
(9b)  
(9c) 
where , and are parameters.
All these variable splitting algorithms provide simple ways to construct efficient iterative algorithms that offer simpler inner subproblems. However, the subproblems such as (5), (6), (6) and (9a) remain computationally expensive, as they involve large matrixvector products when the dimensionality of is large. We circumvent this problem by combining variable splitting with KS and IEKS.
IC The Iterated Extended Kalman Smoother
IEKS [5] is an approximative algorithm for solving nonlinear optimal smoothing problems. However, it can also be seen [5] as an efficient implementation of the Gauss–Newton algorithm for solving the problem
(10) 
That is, it produces the maximum a posteriori (MAP) estimate of the trajectory. The IEKS method works by alternating between linearisation of and around a previous estimate , as follows:
(11a)  
(11b) 
and solving the linearized problem
(12) 
where is the Jacobian of . The solution of (12) can in turn be efficiently obtained by the Rauch–Tung–Striebel (RTS) smoother [4], which first computes the filtering mean and covariances and , by alternating between prediction
(13a)  
(13b) 
and update
(14a)  
(14b)  
(14c)  
(14d) 
where and are the innovation covariance matrix and the Kalman gain at the time step , respectively. The filtering means and covariances are then corrected in a backwards (smoothing) pass
(15a)  
(15b)  
(15c) 
Now setting gives the solution to (12). When the functions and are linear, the above iteration converges in a single step. This algorithm is the classical RTS smoother or more briefly KS [4].
In this paper, we use the KS and IEKS algorithms as efficient methods for solving generalized versions of the optimization problems given in (10), which arise within the steps of variable splitting.
Ii Linear State Estimation by KSADMM
In this section, we present the KSADMM algorithm which is a novel algorithm for solving regularized linear Gaussian state estimation problems. In particular, Section IIA describes the batch solution by ADMM. Then, by defining an artificial measurement noise and a pseudomeasurement, we formulate the KSADMM algorithm to solve the primal variable update in Section IIB.
Iia Batch Optimization
Let us assume that the state transition function and the measurement function are linear, denoted by
(16) 
where and are the transition matrix and the measurement matrix, respectively. In order to reduce this problem to (3), we stack the entire state sequence into a vector, which transforms the objective into a batch optimization problem. Thus, we define the following variables
(17a)  
(17b)  
(17c)  
(17d)  
(17e)  
(17f)  
(17g)  
(17h) 
The optimization problem introduced in Section IA can now be reformulated as the following batch optimization problem
(18) 
which in turn can be seen to be a special case of (3). Here, our algorithm for solving (18) builds upon the batch ADMM [28].
To derive an ADMM algorithm for (18), we introduce an auxiliary variable and a linear equality constraint . The resulting equalityconstrained problem is formulated mathematically as
(19) 
The main objective here is to find a stationary point of the augmented Lagrangian function associated with (19) as the function
(20) 
where is the dual variable and is a penalty parameter. As described in Section IB, at each iteration of ADMM we perform the updates
(21a)  
(21b)  
(21c) 
The update for the primal sequence is equivalent to the quadratic optimization problem given by
(22) 
which has the closedform solution
(23) 
While the optimization problem (22) can be solved in closedform, direct solution is computationally demanding, especially when the number of time points or the dimensionality of the state is large. However, the problem can be recognized to be a special case of optimization problems where the iterations can be solved by KS (see Section IC) provided that we add pseudomeasurements to the problem. In the following, we present the resulting algorithm.
IiB The KSADMM Solver
The proposed KSADMM solver is described in Algorithm 1. To extend the batch ADMM to KSADMM, we first define an artificial measurement noise and a pseudomeasurement , and then rewrite (22) as
(25) 
The solution to (25) can then be computed by running KS on the state estimation problem
(26a)  
(26b)  
(26c) 
Here, is an independent random variable with covariance . The KSbased solution can be described as a four stage recursive process: prediction, update, update, and the RTS smoother which should be performed for . First, the prediction step is given by
(27a)  
(27b) 
where and are the predicted mean and covariance at the time . Secondly, the update steps for are given by
(28a)  
(28b)  
(28c)  
(28d) 
Thirdly, the update steps for are
(29a)  
(29b)  
(29c)  
(29d) 
Here, and , and , and , and are the innovation covariances, gain matrices, means, and covariances for the variables and at the time step , respectively. Finally, we run a RTS smoother [4] for = , which has the steps
(30a)  
(30b)  
(30c) 
where and (see [2] for more details). This gives the update for as:
(31) 
The remaining updates for are given as
(32) 
where , and
(33) 
It is useful to note that in Algorithm 1, the covariances and gains are independent of the iteration number and thus can be precomputed outside the ADMM iterations. Furthermore, when the model is timeindependent, we can often use stationary Kalman filters and smoothers instead of their general counterparts which can further be used to speed up the computations.
IiC Convergence of KSADMM
In this section, we discuss the convergence of KSADMM. If the system (26) is detectable [40], then the objective function (20) is convex. The traditional convergence results for ADMM such as in [28, 41] then ensure that the objective globally converges to the stationary (optimal) point . The result is given in the following.
Theorem 1 (Convergence of KSADMM).
Iii Nonlinear State Estimation by IEKSADMM
When and are nonlinear, the subproblem arising in the ADMM iteration cannot be solved in closed form. In the following, we first present a batch solution of the nonlinear case based on a Gauss–Newton (GN) iteration and then show how it can be efficiently implemented using IEKS.
Iiia Batch Optimization
Let us now consider the case where the state transition function and the measurement function in (1) are nonlinear. We now proceed to rewrite the optimization (2) in batch form by defining the following variables
(34a)  
(34b) 
Note that the variables , , , , and have the same definitions as (17). Using these variables, the subproblem can be naturally transformed into
(35) 
which is also a special case of (3), similarly to the linear case.
Following the ADMM, we define the augmented Lagrangian function associated with (35) as:
(36) 
Since the nonlinear batch solution is based on ADMM, the iteration steps of and are the same with the linear case (see Equations (24) and (21c)). Here, we focus on introducing the solution of the primal variable .
When updating , the objective is no longer a quadratic function. However, the optimization problem can be solved with GN [42]. Here, the subproblem is rewritten as
(37) 
where
Then, the gradient of is given by
(38) 
where
and the Hessian is , where
In GN, avoiding the trouble of computing the residual , we use the approximation to replace , which means is assumed to be small enough. Thus, the primal variable in iteration is updated by:
(40) 
The iterations can stopped after a maximum number of iterations or if the condition is satisfied, where is an error tolerance. If is small enough, then it means that the above algorithm has (almost) converged. The rest of the ADMM updates can be implemented similarly to the linear Gaussian case.
IiiB The IEKSADMM Solver
We now move on to consider the IEKSADMM solver. As discussed in Section IC, IEKS can be seen as an efficient implementation of the GN method, which inspires us to derive an efficient implementation of the batch ADMM.
Now, we rewrite the subproblem (37) as
(41) 
In a modest scale (e.g., ), can be directly computed by (40) although its computations scale as . When is large, the batch ADMM will have high memory and computational requirements. In this case, the use of IEKS becomes beneficial due to its linear computational scaling. In this paper, the proposed method incorporates IEKS into ADMM to design the IEKSADMM algorithm for solving the nonlinear case.
In the IEKS algorithm, the Gaussian smoother is run several times with and and their Jacobians are evaluated at the previous (inner loop) iteration. The detailed iteration steps of IEKSADMM are described in Algorithm 2. In particular, following the prediction steps (13) in Section IC, the update steps for are given by
(42a)  
(42b)  
(42c)  
(42d) 
and for the pseudomeasurement , the update steps are the same as in the linear case. They are given in (29).
Comments
There are no comments yet.