## 1 Introduction

Tracking a hidden state vector from noisy observations in

rt is at the core of many sp applications. A leading approach for the task is the kf [kalman1960new], which operates with low complexity and achieves a mmse (mmse) in setups characterized by lg ss (ss) models. A key merit of the kf, which is of paramount importance in safety critical applications, e.g., autonomous driving, aviation, and medical, is its ability to provide uncertainty alongside state estimation [durbin2012time, Ch. 4]. kf and its variants are mb (mb) algorithms, and are therefore sensitive to inaccuracies in modelings the system dynamics using the ss model.The recent success of deep learning architectures, such as rnn [chung2014empirical] and attention mechanisms [vaswani2017attention], in learning from complex time-series data in unstructured environments and in a model-agnostic manner, evoked the interest of using them for tracking dynamical systems. However, the lack of interpability of deep architectures and their inability to capture uncertainty [becker2019recurrent], together with the fact that they require many trainable parameters and large data sets even for simple setups [zaheer2017latent], limit their applicability for safety-critical applications in hardware-limited systems.

Characterizing uncertainty in dnn is an active area of research [nguyen2015deep, poggi2017quantitative, osband2021epistemic]. One approach is to use Bayesian dnn [jospin2020hands], which, when combined with ss models, enable to extract uncertainty, see, e.g., [karl2016deep, krishnan2017structured, naesseth2018variational]. However, doing so relies on variational inference, which makes learning more complex and less scalable, and cannot be used directly for state estimation [becker2019recurrent]. Alternatively, one can use deep learning techniques estimate the ss model parameters and then plug them into a variant of a kf which provides uncertainty, e.g., [abbeel2005discriminative, haarnoja2016backprop, laufer2018hybrid, xu2021ekfnet]

. These approaches are limited in accuracy and complexity due the need to linearize the second-order moments to compute the error covariance as required by the

mb kf. The kf-inspired rnn proposed [becker2019recurrent] was trained to predict both the state and the error. However, [becker2019recurrent] focused on specific factorizable ss models with partially observable states, for which the kf is simplified to parallel scalar operations, and the proposed architecture does not naturally extend to general ss models.The recently proposed kn [KalmanNetTSPa] utilizes rnn to enhance the robustness of the kf for complex and mismatched models as a form of mb deep learning [shlezinger2020model]. kn was shown to operate reliably in practical hardware-limited systems [KalmannetICAS21] due to its principled incorporation of partial domain knowledge. In this work we show how kn can be extended to provide uncertainty measures, by exploiting its interpretable architectures and the fact that it preserves the internal flow of the kf. In particular, we build upon the identification of an internal feature as the estimated kg (kg), and show that when combined with partial domain knowledge, it can be used to compute the time-dependent error covariance matrix. We numerically show that the extracted uncertainty indeed reflects the performance of kn, providing similar state estimates and error measures as the kf which knows the ss model, while being notably more reliable in terms of tracking and uncertainty in the presence of mismatched models.

## 2 System Model and Preliminaries

We review the ss model and briefly recall the mb kf, since its operation serves as the baseline for kn and for its uncertainty extraction scheme detailed in Section 3. We then recap the dd filtering problem and the architecture of kn. For simplicity, we focus on linear ss models, though the derivations can also be used for non-linear models in the same manner as the extended kf is applied [durbin2012time, Ch. 10].

### 2.1 System Model and Model-Based Kalman Filtering

We consider a dynamical system characterized by a linear, Gaussian, continuous evolution model in dt. For , this ss model is defined by [bar2004estimation]

(1a) | ||||||

(1b) |

In (1a), is the latent state vector of the system at time , which evolves by a linear state evolution matrix and by an awgn (awgn) with noise covariance . In (1b), is the vector of observations at time , is the measurement matrix, and is an awgn with measurement noise covariance . The filtering problem deals with rt se; i.e., the recovery of from for each time instance [durbin2012time].

The kf is a two-step, low complexity, recursive algorithm that produces a new estimate from a new observation based on the previous estimate as a sufficient statistic. In the first step it predicts the statistical moments based on the previous a posteriori estimates:

(2a) | ||||

(2b) |

In the second step, the a posteriori moments are updated based on the a priori moments. The pivot computation for this step is the kg :

(3) |

Given the new observation , the state estimate, i.e., the first-order posterior, is obtained via

(4) |

The second-order posterior, which is the estimation error covariance, is then computed as

(5) |

When the noise is Gaussian, and the ss model parameters (1) are fully known, the kf is the mmse estimator.

### 2.2 Data-Driven Filtering with KalmanNet

In practice, the state evolution model (1a) is determined by the complex dynamics of the underlying system, while the observation model (1b) is dictated by the type and quality of the observations. For instance, can be the location, velocity, and acceleration of a vehicle, while are measurements obtained from several sensors. For rw problems it is often difficult to characterize the ss model accurately. In data-driven filtering, one relies on a labeled ds—i.e., with its corresponding gt —to fill the information gap.

kn [KalmanNetTSPa] is a dnn-aided architecture for se and filtering with partial domain knowledge. It considers filtering without knowledge of the noise covariance matrices and with a possibly inaccurate description of the evolution matrix and the observation model , obtained from, e.g., understanding of the system dynamics and the sensing setup. Although , the observation covariance, is not needed for state estimation with kn, for simplicity of derivation, here we assumes that it is known (or estimated) and thus set , since one can always process without altering the achievable mse (mse).

kn [KalmanNetTSPa] implements data-driven filtering by augmenting the theoretically solid flow of the mb kf with a rnn. The latter is designed to learn to estimate the kg, whose mb computation encapsulates the missing domain knowledge, from data. The inherent memory of the rnn allows implicit tracking of the second-order statistical moments without requiring knowledge of the underlying noise statistics. In a similar manner as the kf, it predicts the first-order moments and , which are then used to estimate via (4). The overall system, illustrated in Fig. 1, is trained end-to-end to minimize the mse between and the true .

## 3 Uncertainty in KalmanNet

kn, detailed in Subsection 2.2, is designed and trained to estimate the state variable . Neither its architecture nor the loss measure it uses for training encourage kn to maintain an estimate of its error covariance, which the kf provides. However, as we show here, the fact that kn preserves the flow of the kf allows to estimate its error covariance from its internal features.

### 3.1 Kalman Gain-based Error Covariance

To extend kn to provide uncertainty, we build on top of the interpretable feature of kn which is the estimated kg. Combined with the observation model, the kg can be used compute the time-dependent error covariance matrix , thus bypassing the need to explicitly estimate the evolution model, as stated in the following theorem:

###### Theorem 1.

Consider the ss model (1) where has full column rank; i.e., exists. Then, filtering with the kg results in estimation with error covariance

(6) |

###### Proof.

By combining (3) and (5), the error covariance can be written as . Thus, to estimate , we express using and the available domain knowledge. To that end, multiplying (3) by gives

(7) |

Next, (7) is multiplied by from the right side followed by substitution of (2b), which yields

(8) |

Combining terms in (8) results in

(9) |

Multiplying (9) from the left by and from the right by yields , which, when exists, results in

(10) |

concluding the proof of the theorem. ∎

Theorem 1 indicates that when the observations model is known and has full column rank, requiring the number of measurements is not smaller than the number of tracked states, then one can extend kn to predict its error covariance along side its state estimation. The resulting procedure is summarized as Algorithm 1.

### 3.2 Discussion

kn [KalmanNetTSPa] was designed for rt tracking in nl ss models with unknown noise statistics. Algorithm 1 extends it to provide not only , an estimate of the state i.e., for which it was previously derived and trained, but also , an estimate of the time-dependent covariance matrix as an uncertainty measure. This is achieved since kn retains the interpretable flow of the mb kf, and by using the kg that learned by its rnn. Numerical evaluations presented in Section 4 indicate that kn computes the true covariance matrix as it reflects the true empirical error—as expected from the mse

bias variance decomposition theorem

[friedman2001elements].For simplicity and clarity of exposition, Algorithm 1 was derived for linear ss models. In Section 4 we also present results for a nl chaotic system, where the extended kf derivation is used, i.e., the ss model matrices are replaced with their respecting Jacobians [durbin2012time, Ch. 10]. The additional requirement that observation noise covariance is known, needed only for the purpose of estimating the covariance, is often satisfied, as in many applications the main modelling challenge is related to the state evolution rather than the measurement noise. Since we assume that we have access to a labeled ds, one can estimate with standard techniques without assumptions regarding the evolution. The case where is not full column rank is left for future work.

## 4 Numerical Evaluations

In this section we numerically^{1}^{1}1The source code along with additional information on the numerical study can be found online at https://github.com/KalmanNet/ERRCOV_ICASSP22. compare the covarince computed by kn to the one computed by the kf in the linear and nl cases.

We start with the lg ss model, for which the kf achieves the mmse

lower bound, and is an unbiased estimator

[humpherys2012fresh]. We generate data from a scalar ss model where and , with sequence length of samples, and compare the error predicted by kn and its deviation from the true error to those computed by the mb kf which knows the ss model in Fig. 2. We observe in Fig. 2 that the theoretical error computed by kf for each time step , and the error predicted by kn coincide, and that both algorithms have a similar empirical error. In Fig. 3 we demonstrate for a single realization of gt trajectory, both algorithms produce the same uncertainty bounds. Next, we consider the case where both kn and the mb kf are plugged-in with a mismatched model parameter . In Figs. 4 and 5 we observe that for such model mismatch, kn produces uncertainty similar to the empirical error, while the kf underestimates its empirical error.Next, we demonstrate the merits of kn when filtering the la—a challenging three-dimensional nl chaotic system—and compare its performance to the extended kf. See [KalmanNetTSPa] for a detailed description of this setup. The model mismatch in this case is due to sampling a ct system described by differential equations to dt. In Fig. 6 we clearly observe that kn achieves a lower mse and estimates its error fairly accurately while the extended kf overestimates it.

## 5 Conclusions

In this work we extended the recently proposed kn state estimator to predict its error alongside the latent state. This is achieved by exploiting the hybrid mb/dd architecture of kn, which produces the kg as an internal feature. We prove that one can often utilize the learned kg to predict the error covariance as a measure of uncertainty. Our numerical results demonstrate that this extension allows kn to accurately predict both the state and error, improving upon the kf in the presence of model-mismatch and non-linearities.

Comments

There are no comments yet.