STATE estimation problems naturally arise in many signal processing applications including target tracking, smart grids, and robotics [1, 2, 3]. In conventional Bayesian approaches, the estimation task is cast as a statistical inverse problem for restoring the original time series from imperfect measurements, based on a statistical model for the measurements given the signal together with a statistical model for the signal. In linear Gaussian models, this problem admits a closed-form solution, which can be efficiently implemented by the Kalman (or Rauch–Tung–Striebel) smoother (KS) [4, 2]. For nonlinear Gaussian models we can use various linearization and sigma-point-based methods  for approximate inference. In particular, here we use the so-called iterated extended Kalman smoother (IEKS) , which is based on analytical linearisation of the nonlinear functions. Although the aforementioned smoothers are often used to estimate dynamic signals, they lack a mechanism to promote sparsity in the signals.
One approach for promoting sparsity is to add an -term to the cost function formulation of the state estimation problem. This approach imposes sparsity on the signal estimate, which is either based on a synthesis sparse or an analysis sparse
signal model. A synthesis sparse model assumes that the signal can be represented as a linear combination of basis vectors, where the coefficients are subject to, for example, an-penalty, thus promoting sparsity. In the past decade, the use of synthesis sparsity for estimating dynamic signals has drawn a lot of attention [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. For example, a pseudo-measurement technique was used in the Kalman update equations for encouraging sparse solutions . A method based on sparsity was applied compressive sensing to update Kalman innovations or filtering errors in . Based on synthesis sparsity, the estimation problem has been formulated as an -regularized least square problem in . Nevertheless, the previously mentioned methods typically only consider synthesis sparsity of the signal and assume a linear dynamic system.
On the other hand, analysis sparsity, also called cosparsity, assumes that the signal is not sparse itself, but rather the outcome is sparse or compressible in some transform domain, which leads to the flexibility in the modeling of signals [16, 17, 18, 19, 20]. Analysis sparse models involving an analysis operator – a popular choice being total variation (TV) – have been very successful in image processing. For example, several algorithms [18, 19] have been developed to train an analysis operator and the trained operators have been used for image denoising. In  the authors proposed to use the TV regularizer to improve the quality of image reconstruction. However, these approaches are not ideally suited for reconstructing dynamic signals. In state estimation problems, the available methods for analysis sparse priors are still limited. The main goal of this paper is to introduce these kinds of methods for dynamic state estimation.
Formulating a state estimation problem using synthesis and analysis sparsity leads to a general class of optimization problems, which require minimization of composite functions such as an analysis--regularized least-squares problems. The difficulties arise from the appearance of the nonsmooth regularizer. There are various batch optimization methods such as proximal gradient method , Douglas-Rachford splitting (DRS) [23, 24], Peaceman-Rachford splitting (PRS) [25, 26], the split Bregman method (SBM) , the alternating method of multipliers (ADMM) [28, 29]
, and the first-order primal-dual (FOPD) method for addressing this problem. However, these general methods do not take the inherent temporal nature of the optimization problem into account, which leads to bad computational and memory scaling in large-scale or high-dimensional data. This often renders the existing methods intractable due to their extensive memory and computational requirements.
As a consequence, we propose to combine a Kalman smoother with variable splitting optimization methods, which allows us to account for the temporal nature of the data in order to speed up the computations. In this paper, we derive novel methods for efficiently estimating dynamic signals with an extra (analysis) -regularized term. The developed algorithms are based on using computationally efficient KS and IEKS for solving the subproblems arising within the steps of the optimization methods. Our experiments demonstrate promising performance of the methods in practical applications. The main contributions are as follows:
We formulate the state estimation problem as an optimization problem that is based upon a general sparse model containing analysis or synthesis prior. The formulation accommodates a large class of popular sparsifying regularizers (e.g., synthesis -norm, analysis -norm, total variation norm) for state estimation.
We present novel practical optimization methods, KS-ADMM and IEKS-ADMM, which are based on combining ADMM with KS and IEKS, respectively.
We also prove the convergence of the KS-ADMM method as well as the local convergence of the IEKS-ADMM method.
We generalize our smoother-based approaches to a general class of variable splitting techniques.
The advantage of the proposed approach is that the computational cost per iteration is much less than in the conventional batch solutions. Our approach is computationally superior to the state-of-the-art in a large-scale or high-dimensional state estimation applications.
The rest of the paper is organized as follows. We conclude this section by reviewing variable splitting methods and IEKS. Section II first develops the batch optimization by a classical ADMM method, and then presents a new KS-ADMM method for solving a linear dynamic estimation problem. Furthermore, for the nonlinear case, we present an IEKS-ADMM method in Section III and establish its local convergence properties. Section IV introduces a more general smoother-based variable splitting algorithmic framework. In particular, a general IEKS-based optimization method is formulated. Various experimental results in Section V demonstrate the effectiveness and accuracy in simulated linear and nonlinear state estimation problem. The performance of the algorithm is also illustrated in real-world tomographic reconstruction.
I-a Problem Formulation
where , denotes an -dimensional state of the system at the time step , and is an -dimensional noisy measurement signal, is a measurement function (typically with ), and is a state transition function at time step . The initial state is assumed to have mean and covariance . The errors and
are assumed to be mutually independent random variables with known positive definite covariance matricesand , respectively. The goal is to estimate the state sequence from the noisy measurement sequence . In this paper, we focus on estimating by minimizing the sum of quadratic error terms and an sparsity-promoting penalty.
For the sparsity assumption, we add an extra -penalty for the state , and then formulate the optimization problem as
where is the optimal state sequence, is a linear operator, and is a penalty parameter. The formulation (2) encompasses two particular cases: by setting
to a diagonal matrix (e.g., identity matrix), a synthesis sparse model is obtained, which assumes that are sparse. Such a case arises frequently in state estimation applications [15, 10, 11, 30, 12]. Correspondingly, an analysis sparse model is obtained when a more general is used. For example, total variation (TV) regularization, which is common in tomographic reconstruction, can be obtained by using a finite-difference matrix as .
More generally, can be a fixed matrix [31, 20, 16] or a learned matrix [17, 18, 19]. It should be noted that, if the term is not used (i.e., when ) in (2), the objective can be solved by using a linear or non-linear Kalman)smoother [4, 5, 2]. However, when , the smoothing is no longer applicable, and the cost function is non-differentiable.
Since does not have a closed-form proximal operator in general, we employ variable splitting technique for solving the resulting optimization problem. As mentioned above, many variable splitting methods can be used to solve (2), such as PRS , split SBM , ADMM, DRS , and FOPD . Especially, ADMM is a popular member of this class. Therefore, we start by presenting algorithms based on ADMM and then extend them to more general variable splitting methods. In the following, we review variable splitting and IEKS methods, before presenting our approach in detail.
I-B Variable Splitting
where , and is a matrix. Variable splitting refers to the process of introducing an auxiliary constrained variable to separate the components in the cost function. More specifically, we impose the constraint , which transforms the original minimization problem (3) into an equivalent constrained minimization problem given by
The minimization problem (4) can be solved efficiently by classical constrained optimization methods . The rationale of variable splitting is that it may be easier to solve the constrained problem (4) than the unconstrained one (3). PRS, SBM, FOPD, ADMM, and their variants  are a few well-known variable splitting methods – see also [37, 38] for a recent historical overview.
ADMM  is one of the most popular algorithms for solving (4). ADMM defines an augmented Lagrangian function, and then alternates between the updates of the split variables. Given , , and , its iterative steps are:
where is a Lagrange multiplier and is a parameter.
In SBM , we iterate the steps
times, and update the extra variable by
When , this is equivalent to ADMM.
There are also other variable splitting methods which alternate proximal steps for the primal and dual variables. One example is FOPD , where the th iteration consists of the following subproblems
where , and are parameters.
All these variable splitting algorithms provide simple ways to construct efficient iterative algorithms that offer simpler inner subproblems. However, the subproblems such as (5), (6), (6) and (9a) remain computationally expensive, as they involve large matrix-vector products when the dimensionality of is large. We circumvent this problem by combining variable splitting with KS and IEKS.
I-C The Iterated Extended Kalman Smoother
IEKS  is an approximative algorithm for solving non-linear optimal smoothing problems. However, it can also be seen  as an efficient implementation of the Gauss–Newton algorithm for solving the problem
That is, it produces the maximum a posteriori (MAP) estimate of the trajectory. The IEKS method works by alternating between linearisation of and around a previous estimate , as follows:
and solving the linearized problem
where is the Jacobian of . The solution of (12) can in turn be efficiently obtained by the Rauch–Tung–Striebel (RTS) smoother , which first computes the filtering mean and covariances and , by alternating between prediction
where and are the innovation covariance matrix and the Kalman gain at the time step , respectively. The filtering means and covariances are then corrected in a backwards (smoothing) pass
In this paper, we use the KS and IEKS algorithms as efficient methods for solving generalized versions of the optimization problems given in (10), which arise within the steps of variable splitting.
Ii Linear State Estimation by KS-ADMM
In this section, we present the KS-ADMM algorithm which is a novel algorithm for solving -regularized linear Gaussian state estimation problems. In particular, Section II-A describes the batch solution by ADMM. Then, by defining an artificial measurement noise and a pseudo-measurement, we formulate the KS-ADMM algorithm to solve the primal variable update in Section II-B.
Ii-a Batch Optimization
Let us assume that the state transition function and the measurement function are linear, denoted by
where and are the transition matrix and the measurement matrix, respectively. In order to reduce this problem to (3), we stack the entire state sequence into a vector, which transforms the objective into a batch optimization problem. Thus, we define the following variables
The optimization problem introduced in Section I-A can now be reformulated as the following batch optimization problem
To derive an ADMM algorithm for (18), we introduce an auxiliary variable and a linear equality constraint . The resulting equality-constrained problem is formulated mathematically as
The main objective here is to find a stationary point of the augmented Lagrangian function associated with (19) as the function
where is the dual variable and is a penalty parameter. As described in Section I-B, at each iteration of ADMM we perform the updates
The update for the primal sequence is equivalent to the quadratic optimization problem given by
which has the closed-form solution
While the optimization problem (22) can be solved in closed-form, direct solution is computationally demanding, especially when the number of time points or the dimensionality of the state is large. However, the problem can be recognized to be a special case of optimization problems where the iterations can be solved by KS (see Section I-C) provided that we add pseudo-measurements to the problem. In the following, we present the resulting algorithm.
Ii-B The KS-ADMM Solver
The solution to (25) can then be computed by running KS on the state estimation problem
Here, is an independent random variable with covariance . The KS-based solution can be described as a four stage recursive process: prediction, -update, -update, and the RTS smoother which should be performed for . First, the prediction step is given by
where and are the predicted mean and covariance at the time . Secondly, the update steps for are given by
Thirdly, the update steps for are
Here, and , and , and , and are the innovation covariances, gain matrices, means, and covariances for the variables and at the time step , respectively. Finally, we run a RTS smoother  for = , which has the steps
where and (see  for more details). This gives the update for as:
The remaining updates for are given as
where , and
It is useful to note that in Algorithm 1, the covariances and gains are independent of the iteration number and thus can be pre-computed outside the ADMM iterations. Furthermore, when the model is time-independent, we can often use stationary Kalman filters and smoothers instead of their general counterparts which can further be used to speed up the computations.
Ii-C Convergence of KS-ADMM
In this section, we discuss the convergence of KS-ADMM. If the system (26) is detectable , then the objective function (20) is convex. The traditional convergence results for ADMM such as in [28, 41] then ensure that the objective globally converges to the stationary (optimal) point . The result is given in the following.
Theorem 1 (Convergence of KS-ADMM).
Iii Nonlinear State Estimation by IEKS-ADMM
When and are nonlinear, the subproblem arising in the ADMM iteration cannot be solved in closed form. In the following, we first present a batch solution of the nonlinear case based on a Gauss–Newton (GN) iteration and then show how it can be efficiently implemented using IEKS.
Iii-a Batch Optimization
Let us now consider the case where the state transition function and the measurement function in (1) are nonlinear. We now proceed to rewrite the optimization (2) in batch form by defining the following variables
Note that the variables , , , , and have the same definitions as (17). Using these variables, the subproblem can be naturally transformed into
which is also a special case of (3), similarly to the linear case.
Following the ADMM, we define the augmented Lagrangian function associated with (35) as:
Since the nonlinear batch solution is based on ADMM, the iteration steps of and are the same with the linear case (see Equations (24) and (21c)). Here, we focus on introducing the solution of the primal variable .
When updating , the objective is no longer a quadratic function. However, the optimization problem can be solved with GN . Here, the subproblem is rewritten as
Then, the gradient of is given by
and the Hessian is , where
In GN, avoiding the trouble of computing the residual , we use the approximation to replace , which means is assumed to be small enough. Thus, the primal variable in iteration is updated by:
The iterations can stopped after a maximum number of iterations or if the condition is satisfied, where is an error tolerance. If is small enough, then it means that the above algorithm has (almost) converged. The rest of the ADMM updates can be implemented similarly to the linear Gaussian case.
Iii-B The IEKS-ADMM Solver
We now move on to consider the IEKS-ADMM solver. As discussed in Section I-C, IEKS can be seen as an efficient implementation of the GN method, which inspires us to derive an efficient implementation of the batch ADMM.
Now, we rewrite the subproblem (37) as
In a modest scale (e.g., ), can be directly computed by (40) although its computations scale as . When is large, the batch ADMM will have high memory and computational requirements. In this case, the use of IEKS becomes beneficial due to its linear computational scaling. In this paper, the proposed method incorporates IEKS into ADMM to design the IEKS-ADMM algorithm for solving the nonlinear case.
In the IEKS algorithm, the Gaussian smoother is run several times with and and their Jacobians are evaluated at the previous (inner loop) iteration. The detailed iteration steps of IEKS-ADMM are described in Algorithm 2. In particular, following the prediction steps (13) in Section I-C, the update steps for are given by
and for the pseudo-measurement , the update steps are the same as in the linear case. They are given in (29).