, and machine learning. In the case of autonomous linear time invariant (LTI) systems driven by Gaussian process and sensor noise:
the celebrated Kalman Filter (KF) has been the standard method for prediction (Anderson and Moore, 2005) . When model (1) is known, the KF minimizes the mean square prediction error. However, in many practical cases of interest (e.g., tracking moving objects, stock price forecasting), the state-space parameters are not known and must be learned from time-series data. This system identification step, based on a finite amount of data, inevitably introduces parametric errors in model (1), which leads to a KF with suboptimal prediction performance (El Ghaoui and Calafiore, 2001).
In this paper, we study this scenario, and provide finite-data estimation guarantees for the Kalman Filtering of an unknown autonomous LTI system (1). We consider a simple two step procedure. In the first step, using system identification tools rooted in subspace methods, we obtain finite-data estimates of the state-space parameters and Kalman gain describing system (1). Then, in the second step, we use these approximate parameters to design a filter which predicts the system state. We provide an end-to-end analysis of this two-step procedure, and characterize the sub-optimality of the resulting filter in terms of the number of samples used during the system identification step, where the sub-optimality is measured in terms of the mean square prediction error of the filter. A key insight that emerges from our analysis is that using a Certainty Equivalent (CE) Kalman Filter, i.e., using a KF computed directly from estimated parameters, can yield poor estimation performance if the resulting CE KF has eigenvalues close to the unit circle. To address this issue, we propose a Robust Kalman Filter that mitigates these effects and that still enjoys provable sub-optimality guarantees.
Our main contributions are that: i) we show that if the system identification step produces sufficiently accurate estimates, or if the underlying true KF is sufficiently robust, then the CE KF has near optimal mean square prediction error, ii) we show when the CE KF is marginally stable, i.e., when it has eigenvalues close to the unit circle, that a Robust KF synthesized by explicitly imposing bounds on the magnitude of certain closed loop maps of the system enjoys similar mean square prediction error bounds as the CE KF, while demonstrating improved stability properties, and iii) we integrate the above results with the finite-data system identification guarantees of Tsiamis and Pappas (2019), to provide, to the best of our knowledge, the first end-to-end sample complexity bounds for the Kalman Filtering of an unknown system. In particular, we show that the mean square estimation error of both the Certainty Equivalent and Robust Kalman filter produced by the two step procedure described above is, with high probability, bounded by , where is the number of samples collected in the system identification step.
Related work. A similar two step process was studied for the Linear Quadratic (LQ) control of an unknown system in Dean et al. (2017); Mania et al. (2019). While LQ optimal control and Kalman Filtering are known to be dual problems, this duality breaks down when the state-space parameters describing the system dynamics are not known. In particular, the LQ optimal control problem assumes full state information, making the system identification step much simpler – in particular, it reduces to a simple least-squares problem. In contrast, in the KF setting, as only partial observations are available, the additional challenge of finding an appropriate system order and state-space realization must be addressed. On the other hand, in the KF problem one can directly estimate the KF gain from data, which makes analyzing performance of the CE KF simpler than the performance of the CE LQ optimal controller (Mania et al., 2019).
System identification of autonomous LTI systems (1) is referred to as stochastic system identification (Van Overschee and De Moor, 2012). Classical results consider the asymptotic consistency of stochastic subspace system identification, as in Deistler et al. (1995); Bauer et al. (1999), whereas contemporary results seek to provide finite-data guarantees (Tsiamis and Pappas, 2019; Lee and Lamperski, 2019). Finite-data guarantees for system identification of partially observed systems can also be found in Oymak and Ozay (2018); Simchowitz et al. (2019); Sarkar et al. (2019), but these results focus on learning the non-stochastic part of the system, assuming that a user specified input is used to persistently excite the dynamics.
Classical approaches to robust Kalman Filtering can be found in El Ghaoui and Calafiore (2001); Sayed et al. (2001); Levy and Nikoukhah (2012), where parametric uncertainty is explicitly taken into account during the filter synthesis procedure. Although similar in spirit to our robust KF procedure, these approaches assume fixed parametric uncertainty, and do not characterize the effects of parametric uncertainty on estimation performance, with this latter step being key in providing end-to-end sample complexity bounds. We also note that although not directly comparable to our work, the filtering problem for an unknown LTI system was also recently studied in the adversarial noise setting in Hazan et al. (2018), where a spectral filtering technique is used to directly estimate the latent state, bypassing the system identification step.
Paper structure. In Sec. 2, we formulate the problem, and in Sec. 3 and 4, we derive performance guarantees for the proposed CE and Robust Kalman filters. In Sec. 5, we provide end-to-end sample complexity bounds for our two step procedure, and demonstrate the effectiveness of our pipeline with a numerical example in Sec. 6. We end with a discussion of future work in Sec. 7. All proofs, missing details, and a summary of the system identification results from Tsiamis and Pappas (2019) can be found in the Appendix.
Notation. We let bold symbols denote the frequency representation of signals. For example, . If is stable with spectral radius , then we denote its resolvent by . The system norm is defined by , where is the Frobenius norm. The system norm is defined by , where is the spectral norm. Let be the set of real rational stable strictly proper transfer matrices.
2 Problem Formulation
For the remainder of the paper, we consider the Kalman Filter form of system (1):
where is the prediction (state), is the output, and is the innovation process. The innovations are assumed to be i.i.d. zero mean Gaussians, with positive definite covariance matrix , and the initial state is assumed to be . In general, the system (1) driven by i.i.d. zero mean Gaussian process and sensor noise is equivalent to system (2) for a suitable gain matrix , as both noise models produce outputs with identical statistical properties (Van Overschee and De Moor, 2012, Chapter 3). We make the following assumption throughout the rest of the paper.
Matrices are unknown, and the pair is observable. Both the matrices and have spectral radius less than , i.e., and .
The observability assumption is standard, and the stability of follows from the properties of the Kalman filter (Anderson and Moore, 2005). We note that the filter synthesis procedures we propose can be applied even if – however, in this case, we are unable to guarantee bounded estimation error for the resulting CE and robust KFs (see Theorem 5 and Lemma 6).
Our goal is to provide end-to-end sample complexity bounds for the two step pipeline illustrated in Fig. 1. First, we collect a trajectory of length from system (2), and use system identification tools with finite-data guarantees to learn estimates of the parameters , and bound the corresponding parameter uncertainties by . Second, we use these approximate parameters to synthesize a filter from the following class:
where are to be designed and is the filter’s mean square prediction error as defined with respect to the optimal KF. Note that the predictor class above includes the CE KF – see Section 3 – and that if the the true system parameters are known, i.e., if , , , then the optimal mean squared prediction error is achieved.
Problem 1 (End-to-end Sample Complexity)
To address Problem 1, we will: i) leverage recent results regarding the the sample complexity of stochastic system identification, ii) provide estimation guarantees for certainty equivalent as well as robust Kalman filter designed using the identified system parameters (see Problem 2 below), and (iii) provide end-to-end performance guarantees by integrating steps (i) and (ii) (see Problem 1 above).
Recently Tsiamis and Pappas (2019) provided a finite sample analysis for stochastic system identification which provides bounds on the identification error . Leveraging these results, we focus next on solving the Filter Synthesis task described below using both a certainty equivalent Kalman filter as well as a robust Kalman filter.
Problem 2 (Near Optimal Kalman Filtering of an Uncertain System)
Consider system (2). Let be estimates satisfying111 We assume throughout that the realization of the estimated parameters is consistent with that of the underlying system , such that the estimation errors are minimal. In practice, estimating the parameters of a partially observed system (2) is ill-posed, in that any similarity transformation can be applied to generate parameters describing the same system; the bounds described hold up to a similarity transformation . All results in this paper apply nearly as is to the general case of under suitable assumptions – see Appendix, Sections D.2, E. Design a Kalman filter in class (3), defined by gains , with mean square prediction error decaying with the size of the parameter uncertainty, i.e., such that .
3 Estimation Guarantees for Certainty Equivalent Kalman Filtering
For the certainty equivalent Kalman filter, we directly use the estimated state-space parameters from the system identification step. Based on the estimated we compute the covariance:
Then, based on standard Kalman filter theory, we compute the stabilizing solution222A stabilizing solution to the Riccati equation defines a Kalman gain such that . of the following Riccati equation with correlation terms (Kailath et al., 2000):
Then, the CE Kalman filter gain is static and takes the form
Trivially, if , then the stabilizing solution of the Riccati equation is with ; the solution does not depend on – see Lemma A in the Appendix. The next result shows that if the underlying true Kalman filter is sufficiently robust, as measured by a spectral decay rate, and that estimation parameter errors are sufficiently small, then the CE Kalman filter achieves near optimal performance. [Near Optimal Certainty Equivalent Kalman Filtering] Consider Problem 2 and the CE KF (5). For any , define If the robustness condition is satisfied, then and:
where , and .
The transient behavior of the CE Kalman filter is governed by the closed loop eigenvalues of , with performance degrading as eigenvalues approach the unit circle. This may occur if the estimation errors are large enough to cause even if the true system has spectral radius (which is often the case in practice). We show in the next section that this undesirable scenario can be avoided by explicitly constraining the transient response of the resulting Kalman filter to satisfy certain robustness constraints.
4 Estimation Guarantees for Robust Kalman Filtering
To address the possible poor performance of the CE Kalman filter when model uncertainty is large, we propose to search over dynamic filters (3) subject to additional robustness constraints on their transient response. Using the System Level Synthesis (SLS) framework (Wang et al., 2019; Anderson et al., 2019) for Kalman Filtering (Wang et al., 2015), we parameterize the class of dynamic filters (3) subject to additional robustness constraints in a way that leads to convex optimization problems.
For a given dynamic predictor , define the closed loop system responses:
In (Wang et al., 2015), it is shown that these responses are in fact the closed loop maps from process and sensor noise to state estimation error, and that the filter gain achieving the desired behavior can be recovered via so long as the responses are constrained to lie in an affine space defined by the system dynamics. By expressing the mean squared prediction error of the filters (3) in terms of their system responses, we are able to clearly delineate the effects of parametric uncertainty from the cost of deviating from the CE Kalman filter. [Error analysis] Consider system (2). Let , , . Any filter (3) with parameterization (6) has mean squared prediction error given by
Based on the previous lemma, we can upper bound the mean squared prediction error of filters (3) by
where . This upper bound clearly separates the effects of parameter uncertainty, as captured by the first term, and the performance cost incurred by the filter due to its deviation from the CE Kalman gain , as captured by the second. In order to optimally tradeoff between these two terms, we propose the following robust SLS optimization problem:
where the constant is a regularization parameter, and the affine constraint parameterizes all filters of the form (3) that have bounded mean squared prediction error (see Appendix or Wang et al. (2015) for more details). As we formalize in the following theorem, for appropriately selected regularization parameter and sufficiently accurate estimation errors , the robust KF has near optimal mean square estimation error. [Robust Kalman Filter] Consider Problem 2 with Kalman filters from class (3) synthesized using the robust SLS optimization problem (7). If the regularization parameter is chosen such that , and further, the estimation errors are such that
then the robust SLS optimization problem is feasible, and the synthesized robust Kalman filter has mean squared prediction error upper-bounded by
where . We further note that whenever the system responses induced by the CE Kalman filter , are a feasible solution to optimization problem (7), they are also optimal, resulting in a filter with performance identical to the CE setting.
5 End-to-End Sample Complexity for the Kalman Filter
Theorems 5 and 7 provide two different solutions to Problem 2. Combining these theorems with with the finite data system identification guarantees of Tsiamis and Pappas (2019), we now derive, to the best of our knowledge, the first end-to-end sample complexity bounds for the Kalman filtering of an unknown system. For both the CE and robust Kalman filter, we show that the mean squared estimation error defined in (3) decreases with rate up to logarithmic terms, where is the number of samples collected during the system identification step. The formal statement of the following theorem which addresses Problem 1 can be found in Theorem E.
[End-to-end guarantees, informal] Fix a failure probability , and assume that we are given a sample trajectory generated by system (2). Then as long as , we have with probability at least that the identification and filter synthesis pipeline of Fig. 1, with system identification performed as in Tsiamis and Pappas (2019) and filter synthesis performed as in Sections 3, 4, achieves mean squared prediction error satisfying
The bound derived in Theorem 5 highlights an interesting tension between how easy it is to identify the unknown system, and the robustness of the underlying optimal Kalman filter. The constant captures how robust the underlying open loop system and closed loop Kalman filter are, as measured by their spectral gaps and . In particular, we expect to be small for systems that admit optimal KFs with favorable robustness and transient performance. In contrast, the constant captures how easy it is to identify a system: recent results for the fully observed setting (Simchowitz et al., 2018; Sarkar and Rakhlin, 2018) suggest that systems with larger spectral radius are in fact easier to identify, as they provide more “signal” to the identification algorithm. In this way, our upper bound suggests that systems which properly balance between these two properties, robust transient performance and ease of identification, enjoy favorable sample complexity.
We also note that the degradation of our bound with the inverse of the spectral gap appears to be a limitation of the proposed offline two step architecture – indeed, Lemma 6 suggests that any estimation error in the state-space parameters causes an increase in mean squared prediction error as increases. It remains open as to whether other prediction architectures would suffer from the same limitation.
We perform Monte Carlo simulations of the proposed pipeline for the system
for varying sample lengths . We simulate both the CE and robust Kalman filters, and set the regularization parameter to in the robust SLS optimization problem (7). For each iteration, we first simulate system (2) to obtain output samples. Then, we perform system identification to obtain the system parameters, after which we synthesize both CE and robust Kalman filters. Finally, we compute the mean prediction error of the designed filters.
For the identification scheme, we used the variation of the MOESP algorithm Qin (2006), which is more sample efficient in practice than the one analyzed in Tsiamis and Pappas (2019)–see Algorithm 1 and Section D.2. The basis of the state-space representation returned by the subspace algorithm is data-dependent and varies with each simulation. For this reason, to compare the performance across different simulations, we compute the mean square error in terms of the original state space basis. Note that the SLS optimization problem (7) is semi-infinite since we optimize over the infinite variables and . To deal with this issue, we optimize over a finite horizon –see for example Dean et al. (2018), which makes the problem finite and tractable. Here, we selected .
Figure 2 (a) and (b) show the empirically computed mean squared prediction errors of the CE and Robust Kalman filters, with the mean, 95th, and 97.5th percentiles being shown. Notice that both errors decrease with a rate of , and that while the average behavior of both filters is quite similar, there is a noticeable gap in their tail behaviors. We observe that the most significant gap between the CE and Robust Kalman filters occurs when the eigenvalues of the CE matrix are close to the unit circle. Fig. 3 shows the empirical distribution of mean squared prediction errors conditioned on the event that . In this case, the CE filter can exhibit extremely poor mean squared prediction error, with the worst observed error (not shown in Fig. 3 in the interst of space) approximately equal to 70 – in contrast, the worst error exhibited by the robust Kalman filter was approximately equal to 5. Thus, we were able to achieve a 14x reduction in worst-case mean squared error. For some simulations the robust KF can exhibit worse performance compared to the CE Kalman filter. However, over all simulations, the mean squared error achieved by the robust Kalman filter was at most 1.64x greater than that achieved by CE Kalman filter.
7 Conclusions & Future work
In this paper, we proposed and analyzed a system identification and filter synthesis pipeline. Leveraging contemporary finite data guarantees from system identification (Tsiamis and Pappas, 2019), as well as novel parameterizations of robust Kalman filters (Wang et al., 2015), we provided, to the best of our knowledge, the first end-to-end sample complexity bounds for the Kalman filtering of an unknown autonomous LTI system. Our analysis revealed that, depending on the spectral properties of the CE Kalman filter, a robust Kalman filter approach may lead to improved performance. In future work, we would like to explore how to improve robustness and performance by further exploiting information about system uncertainty, as well as how to integrate our results into an optimal control framework, such as Linear Quadratic Gaussian control.
- Anderson and Moore (2005) B.D.O. Anderson and J.B. Moore. Optimal Filtering. Dover Publications, 2005.
- Anderson et al. (2019) James Anderson, John C Doyle, Steven H Low, and Nikolai Matni. System level synthesis. Annual Reviews in Control, 2019.
- Bauer and Wagner (2002) Dietmar Bauer and Martin Wagner. Estimating cointegrated systems using subspace algorithms. Journal of Econometrics, 111(1):47–84, 2002.
- Bauer et al. (1999) Dietmar Bauer, Manfred Deistler, and Wolfgang Scherrer. Consistency and asymptotic normality of some subspace algorithms for systems without observed inputs. Automatica, 35(7):1243–1254, 1999.
- Chan et al. (1984) Siew Chan, GC Goodwin, and Kwai Sin. Convergence properties of the Riccati difference equation in optimal filtering of nonstabilizable systems. IEEE Transactions on Automatic Control, 29(2):110–118, 1984.
- Dean et al. (2017) Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. On the sample complexity of the linear quadratic regulator. arXiv preprint arXiv:1710.01688, 2017.
- Dean et al. (2018) Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Advances in Neural Information Processing Systems, pages 4188–4197, 2018.
- Deistler et al. (1995) Manfred Deistler, K Peternell, and Wolfgang Scherrer. Consistency and relative efficiency of subspace methods. Automatica, 31(12):1865–1875, 1995.
- El Ghaoui and Calafiore (2001) Laurent El Ghaoui and Giuseppe Calafiore. Robust filtering for discrete-time systems with bounded noise and parametric uncertainty. IEEE Transactions on Automatic Control, 46(7):1084–1089, 2001.
- Hazan et al. (2018) Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, and Yi Zhang. Spectral filtering for general linear dynamical systems. In Advances in Neural Information Processing Systems, pages 4634–4643, 2018.
- Kailath et al. (2000) Thomas Kailath, Ali H Sayed, and Babak Hassibi. Linear estimation. Prentice Hall, 2000.
- Lee and Lamperski (2019) Bruce Lee and Andrew Lamperski. Non-asymptotic closed-loop system identification using autoregressive processes and hankel model reduction. arXiv preprint arXiv:1909.02192, 2019.
- Levy and Nikoukhah (2012) Bernard C Levy and Ramine Nikoukhah. Robust state space filtering under incremental model perturbations subject to a relative entropy tolerance. IEEE Transactions on Automatic Control, 58(3):682–695, 2012.
- Mania et al. (2019) Horia Mania, Stephen Tu, and Benjamin Recht. Certainty equivalent control of lqr is efficient. arXiv preprint arXiv:1902.07826, 2019.
- Oymak and Ozay (2018) Samet Oymak and Necmiye Ozay. Non-asymptotic Identification of LTI Systems from a Single Trajectory. arXiv preprint arXiv:1806.05722, 2018.
- Qin (2006) S Joe Qin. An overview of subspace identification. Computers & chemical engineering, 30(10-12):1502–1513, 2006.
- Sarkar and Rakhlin (2018) Tuhin Sarkar and Alexander Rakhlin. Near optimal finite time identification of arbitrary linear dynamical systems. arXiv preprint arXiv:1812.01251, 2018.
- Sarkar et al. (2019) Tuhin Sarkar, Alexander Rakhlin, and Munther A Dahleh. Finite-Time System Identification for Partially Observed LTI Systems of Unknown Order. arXiv preprint arXiv:1902.01848, 2019.
- Sayed et al. (2001) Ali H Sayed et al. A framework for state-space estimation with uncertain models. IEEE Transactions on Automatic Control, 46(7):998–1013, 2001.
- Simchowitz et al. (2018) Max Simchowitz, Horia Mania, Stephen Tu, Michael I Jordan, and Benjamin Recht. Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification. arXiv preprint arXiv:1802.08334, 2018.
- Simchowitz et al. (2019) Max Simchowitz, Ross Boczar, and Benjamin Recht. Learning Linear Dynamical Systems with Semi-Parametric Least Squares. arXiv preprint arXiv:1902.00768, 2019.
- Tsiamis and Pappas (2019) Anastasios Tsiamis and George J Pappas. Finite sample analysis of stochastic system identification. In IEEE 58th Conference on Decision and Control (CDC), 2019.
- Van Overschee and De Moor (2012) Peter Van Overschee and Bart De Moor. Subspace identification for linear systems: Theory–Implementation–Applications. Springer Science & Business Media, 2012.
- Wang et al. (2015) Yuh-Shyang Wang, Seungil You, and Nikolai Matni. Localized distributed kalman filters for large-scale systems. IFAC-PapersOnLine, 48(22):52–57, 2015.
- Wang et al. (2019) Yuh-Shyang Wang, Nikolai Matni, and John C Doyle. A system level approach to controller synthesis. IEEE Transactions on Automatic Control, 2019.
Appendix A Properties of the CE Kalman Filter
The following result, which follows from the theory of non-stabilizable Riccati equations Chan et al. (1984), describes the form of the certainty equivalent gain. Consider the assumptions of Problem 2. Assume that is observable and is positive definite. The CE Kalman filter gain (5) has the following properties:
[wide, labelwidth=!, labelindent=0pt]
If , then and is asymptotically stable.
If , and has no eigenvalues on the unit circle, then is asymptotically stable.
If has eigenvalues on the unit circle, then (4) does not admit a stabilizing solution.
Notice that there is no term in the equivalent algebraic Riccati equation. If is already stable then the trivial solution is the stabilizing one. If is not asymptotically stable the results follow from Theorem 3.1 of Chan et al. (1984).
Appendix B SLS preliminaries
Subtracting the two equations and using the fact that , we obtain:
Define the responses to and by and respectively. Then the error obtains the linear representation:
The case of , , can be found in Lemma 6. The following result from Wang et al. (2015) parameterizes the set of stable closed-loop transfer matrices . [Predictor parameterization] Consider system (2). Let denote the set of real rational stable strictly proper transfer matrices. The closed-loop responses from and to can be induced by an internally stable predictor if and only if they belong to the following affine subspace:
Given the responses, we can parameterize the prediction gain as . Let and . The strictly proper condition enforces the constraint The affine constraints simply imply that the system responses should satisfy the linear system recursions:
Assuming that the predictor is internally stable, then the mean square error is equal to
where is the system norm. Hence, the error-free Kalman filter synthesis problem could be re-written as:
Of course, when the model knowledge is perfect, the solution to this problem is trivially , , , .
Appendix C Proofs
Proof of Theorem 5
Let . By adding and subtracting , we obtain the bound:
Hence, from the robustness condition of the theorem it follows that
Now, from Lemma 5 in Mania et al. (2019) it follows that:
Thus, the norm of is upper bounded by
This further implies
Now let and . The proof follows from Lemma 6 and the inequality
Proof of Lemma 6
It is sufficient to show that
then the result follows from the definition of norm and the fact that
Proof of Theorem 7
Step a: First we prove that when optimization problem (7) is feasible, the the mean square error is bounded by:
where we used and optimality of .
Step b: We prove that under condition (8), the static Kalman gain is a feasible gain for (7); equivalently, the responses , and satisfy the constraints of (7). Consider the responses and , which are optimal for the original unknown system. They satisfy the affine relation for the original system:
Adding and subtracting the estimated matrices, we can show that they also satisfy a perturbed affine relation for the estimated system:
If the perturbation is stable, we can multiply both sides from the left, which yields:
where we used the fact that:
Under condition (8), the perturbation has norm bounded by:
which shows that the responses are stable. By construction, they are also strictly proper. What remains to show is that the robustness constraint holds. We have:
Step c: Since is a feasible gain, by suboptimality
where we used .
Appendix D Identification algorithm and analysis
Here we briefly present the results from Tsiamis and Pappas (2019). The stochastic identification algorithm involves two steps. First, we regress future outputs to past outputs to obtain a Hankel-like matrix, which is a product of an observability and a controllability matrix. Second, we perform a realization step, similar to the Ho-Kalman algorithm, to obtain estimates for . The outline can be found in Algorithm 1
Definitions. Let , with be two design parameters that define the horizons of the past and the future respectively. Assume that we are given output samples. We define the future outputs and past outputs at time as follows:
The past and future noises are defined similarly:
The (extended) observability matrix and the reversed (extended) controllability matrix are defined as:
respectively. We define the Hankel matrix:
Finally, for any , define block-Toeplitz matrix: