# Sample Complexity of Kalman Filtering for Unknown Systems

In this paper, we consider the task of designing a Kalman Filter (KF) for an unknown and partially observed autonomous linear time invariant system driven by process and sensor noise. To do so, we propose studying the following two step process: first, using system identification tools rooted in subspace methods, we obtain coarse finite-data estimates of the state-space parameters and Kalman gain describing the autonomous system; and second, we use these approximate parameters to design a filter which produces estimates of the system state. We show that when the system identification step produces sufficiently accurate estimates, or when the underlying true KF is sufficiently robust, that a Certainty Equivalent (CE) KF, i.e., one designed using the estimated parameters directly, enjoys provable sub-optimality guarantees. We further show that when these conditions fail, and in particular, when the CE KF is marginally stable (i.e., has eigenvalues very close to the unit circle), that imposing additional robustness constraints on the filter leads to similar sub-optimality guarantees. We further show that with high probability, both the CE and robust filters have mean prediction error bounded by Õ(1/√(N)), where N is the number of data points collected in the system identification step. To the best of our knowledge, these are the first end-to-end sample complexity bounds for the Kalman Filtering of an unknown system.

## Authors

• 5 publications
• 18 publications
• 41 publications
• ### Online Learning of the Kalman Filter with Logarithmic Regret

In this paper, we consider the problem of predicting observations genera...
02/12/2020 ∙ by Anastasios Tsiamis, et al. ∙ 0

• ### Finite Sample Analysis of Stochastic System Identification

In this paper, we analyze the finite sample complexity of stochastic sys...
03/21/2019 ∙ by Anastasios Tsiamis, et al. ∙ 26

• ### Bellman filtering for state-space models

This article presents a new filter for state-space models based on Bellm...
08/26/2020 ∙ by Rutger-Jan Lange, et al. ∙ 0

• ### SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory

We present an efficient and practical (polynomial time) algorithm for on...
10/12/2020 ∙ by Paria Rashidinejad, et al. ∙ 0

• ### Non-asymptotic Identification of LTI Systems from a Single Trajectory

We consider the problem of learning a realization for a linear time-inva...
06/14/2018 ∙ by Samet Oymak, et al. ∙ 0

• ### Tracking an Object with Unknown Accelerations using a Shadowing Filter

A commonly encountered problem is the tracking of a physical object, lik...
01/21/2015 ∙ by Kevin Judd, et al. ∙ 0

• ### Finite-Data Performance Guarantees for the Output-Feedback Control of an Unknown System

As the systems we control become more complex, first-principle modeling ...
03/25/2018 ∙ by Ross Boczar, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Time series prediction is a fundamental problem across control theory (Kailath et al., 2000), economics (Bauer and Wagner, 2002)

, and machine learning. In the case of autonomous linear time invariant (LTI) systems driven by Gaussian process and sensor noise:

 zk+1=Azk+wk, yk=Czk+vk (1)

the celebrated Kalman Filter (KF) has been the standard method for prediction (Anderson and Moore, 2005) . When model (1) is known, the KF minimizes the mean square prediction error. However, in many practical cases of interest (e.g., tracking moving objects, stock price forecasting), the state-space parameters are not known and must be learned from time-series data. This system identification step, based on a finite amount of data, inevitably introduces parametric errors in model (1), which leads to a KF with suboptimal prediction performance (El Ghaoui and Calafiore, 2001).

In this paper, we study this scenario, and provide finite-data estimation guarantees for the Kalman Filtering of an unknown autonomous LTI system (1). We consider a simple two step procedure. In the first step, using system identification tools rooted in subspace methods, we obtain finite-data estimates of the state-space parameters and Kalman gain describing system (1). Then, in the second step, we use these approximate parameters to design a filter which predicts the system state. We provide an end-to-end analysis of this two-step procedure, and characterize the sub-optimality of the resulting filter in terms of the number of samples used during the system identification step, where the sub-optimality is measured in terms of the mean square prediction error of the filter. A key insight that emerges from our analysis is that using a Certainty Equivalent (CE) Kalman Filter, i.e., using a KF computed directly from estimated parameters, can yield poor estimation performance if the resulting CE KF has eigenvalues close to the unit circle. To address this issue, we propose a Robust Kalman Filter that mitigates these effects and that still enjoys provable sub-optimality guarantees.

Our main contributions are that: i) we show that if the system identification step produces sufficiently accurate estimates, or if the underlying true KF is sufficiently robust, then the CE KF has near optimal mean square prediction error, ii) we show when the CE KF is marginally stable, i.e., when it has eigenvalues close to the unit circle, that a Robust KF synthesized by explicitly imposing bounds on the magnitude of certain closed loop maps of the system enjoys similar mean square prediction error bounds as the CE KF, while demonstrating improved stability properties, and iii) we integrate the above results with the finite-data system identification guarantees of Tsiamis and Pappas (2019), to provide, to the best of our knowledge, the first end-to-end sample complexity bounds for the Kalman Filtering of an unknown system. In particular, we show that the mean square estimation error of both the Certainty Equivalent and Robust Kalman filter produced by the two step procedure described above is, with high probability, bounded by , where is the number of samples collected in the system identification step.

Related work. A similar two step process was studied for the Linear Quadratic (LQ) control of an unknown system in Dean et al. (2017); Mania et al. (2019). While LQ optimal control and Kalman Filtering are known to be dual problems, this duality breaks down when the state-space parameters describing the system dynamics are not known. In particular, the LQ optimal control problem assumes full state information, making the system identification step much simpler – in particular, it reduces to a simple least-squares problem. In contrast, in the KF setting, as only partial observations are available, the additional challenge of finding an appropriate system order and state-space realization must be addressed. On the other hand, in the KF problem one can directly estimate the KF gain from data, which makes analyzing performance of the CE KF simpler than the performance of the CE LQ optimal controller (Mania et al., 2019).

System identification of autonomous LTI systems (1) is referred to as stochastic system identification (Van Overschee and De Moor, 2012). Classical results consider the asymptotic consistency of stochastic subspace system identification, as in Deistler et al. (1995); Bauer et al. (1999), whereas contemporary results seek to provide finite-data guarantees (Tsiamis and Pappas, 2019; Lee and Lamperski, 2019). Finite-data guarantees for system identification of partially observed systems can also be found in Oymak and Ozay (2018); Simchowitz et al. (2019); Sarkar et al. (2019), but these results focus on learning the non-stochastic part of the system, assuming that a user specified input is used to persistently excite the dynamics.

Classical approaches to robust Kalman Filtering can be found in  El Ghaoui and Calafiore (2001); Sayed et al. (2001); Levy and Nikoukhah (2012), where parametric uncertainty is explicitly taken into account during the filter synthesis procedure. Although similar in spirit to our robust KF procedure, these approaches assume fixed parametric uncertainty, and do not characterize the effects of parametric uncertainty on estimation performance, with this latter step being key in providing end-to-end sample complexity bounds. We also note that although not directly comparable to our work, the filtering problem for an unknown LTI system was also recently studied in the adversarial noise setting in Hazan et al. (2018), where a spectral filtering technique is used to directly estimate the latent state, bypassing the system identification step.

Paper structure. In Sec. 2, we formulate the problem, and in Sec. 3 and 4, we derive performance guarantees for the proposed CE and Robust Kalman filters. In Sec. 5, we provide end-to-end sample complexity bounds for our two step procedure, and demonstrate the effectiveness of our pipeline with a numerical example in Sec. 6. We end with a discussion of future work in Sec. 7. All proofs, missing details, and a summary of the system identification results from Tsiamis and Pappas (2019) can be found in the Appendix.

Notation. We let bold symbols denote the frequency representation of signals. For example, . If is stable with spectral radius , then we denote its resolvent by . The system norm is defined by , where is the Frobenius norm. The system norm is defined by , where is the spectral norm. Let be the set of real rational stable strictly proper transfer matrices.

## 2 Problem Formulation

For the remainder of the paper, we consider the Kalman Filter form of system (1):

 xk+1=Axk+Kek, yk=Cxk+ek, (2)

where is the prediction (state), is the output, and is the innovation process. The innovations are assumed to be i.i.d. zero mean Gaussians, with positive definite covariance matrix , and the initial state is assumed to be . In general, the system (1) driven by i.i.d. zero mean Gaussian process and sensor noise is equivalent to system (2) for a suitable gain matrix , as both noise models produce outputs with identical statistical properties (Van Overschee and De Moor, 2012, Chapter 3). We make the following assumption throughout the rest of the paper.

###### Assumption 1

Matrices are unknown, and the pair is observable. Both the matrices and have spectral radius less than , i.e., and .

The observability assumption is standard, and the stability of follows from the properties of the Kalman filter (Anderson and Moore, 2005). We note that the filter synthesis procedures we propose can be applied even if – however, in this case, we are unable to guarantee bounded estimation error for the resulting CE and robust KFs (see Theorem 5 and Lemma 6).

Our goal is to provide end-to-end sample complexity bounds for the two step pipeline illustrated in Fig. 1. First, we collect a trajectory of length from system (2), and use system identification tools with finite-data guarantees to learn estimates of the parameters , and bound the corresponding parameter uncertainties by . Second, we use these approximate parameters to synthesize a filter from the following class:

 ~xk=^A~xk−1+k∑t=1Lt(yk−t−^C~xk−t),~J≜ ⎷limT→∞1TT∑k=0∥~xk−xk∥22 (3)

where are to be designed and is the filter’s mean square prediction error as defined with respect to the optimal KF. Note that the predictor class above includes the CE KF – see Section 3 – and that if the the true system parameters are known, i.e., if , , , then the optimal mean squared prediction error is achieved.

###### Problem 1 (End-to-end Sample Complexity)

Fix a failure probability . Given a single trajectory of system (2), compute system parameter estimates , and design a Kalman filter in class (3), defined by gains , such that with probability at least , we have that , so long as .

To address Problem 1, we will: i) leverage recent results regarding the the sample complexity of stochastic system identification, ii) provide estimation guarantees for certainty equivalent as well as robust Kalman filter designed using the identified system parameters (see Problem 2 below), and (iii) provide end-to-end performance guarantees by integrating steps (i) and (ii) (see Problem 1 above).

Recently  Tsiamis and Pappas (2019) provided a finite sample analysis for stochastic system identification which provides bounds on the identification error . Leveraging these results, we focus next on solving the Filter Synthesis task described below using both a certainty equivalent Kalman filter as well as a robust Kalman filter.

###### Problem 2 (Near Optimal Kalman Filtering of an Uncertain System)

Consider system (2). Let be estimates satisfying111 We assume throughout that the realization of the estimated parameters is consistent with that of the underlying system , such that the estimation errors are minimal. In practice, estimating the parameters of a partially observed system (2) is ill-posed, in that any similarity transformation can be applied to generate parameters describing the same system; the bounds described hold up to a similarity transformation . All results in this paper apply nearly as is to the general case of under suitable assumptions – see Appendix, Sections D.2E. Design a Kalman filter in class (3), defined by gains , with mean square prediction error decaying with the size of the parameter uncertainty, i.e., such that .

## 3 Estimation Guarantees for Certainty Equivalent Kalman Filtering

For the certainty equivalent Kalman filter, we directly use the estimated state-space parameters from the system identification step. Based on the estimated we compute the covariance:

 [^Q^S^S∗^R]≜E[c^Kekek][e∗k^K∗e∗k]=⎡⎣^K^R1/2^R1/2⎤⎦[^R1/2^K∗^R1/2].

Then, based on standard Kalman filter theory, we compute the stabilizing solution222A stabilizing solution to the Riccati equation defines a Kalman gain such that . of the following Riccati equation with correlation terms (Kailath et al., 2000):

 P=^AP^A∗+^Q−(^AP^C∗+^S)(^CP^C∗+^R)−1(^CP^A∗+^S∗). (4)

Then, the CE Kalman filter gain is static and takes the form

 L1=LCE≜(^AP^C∗+^S)(^CP^C∗+^R)−1,Lt=0, for t=2,…. (5)

Trivially, if , then the stabilizing solution of the Riccati equation is with ; the solution does not depend on – see Lemma A in the Appendix. The next result shows that if the underlying true Kalman filter is sufficiently robust, as measured by a spectral decay rate, and that estimation parameter errors are sufficiently small, then the CE Kalman filter achieves near optimal performance. [Near Optimal Certainty Equivalent Kalman Filtering] Consider Problem 2 and the CE KF (5). For any , define If the robustness condition is satisfied, then and:

 ~J≤√3¯Cϵ∥∥∥[RAKI]R1/2∥∥∥H2

where , and .

The transient behavior of the CE Kalman filter is governed by the closed loop eigenvalues of , with performance degrading as eigenvalues approach the unit circle. This may occur if the estimation errors are large enough to cause even if the true system has spectral radius (which is often the case in practice). We show in the next section that this undesirable scenario can be avoided by explicitly constraining the transient response of the resulting Kalman filter to satisfy certain robustness constraints.

## 4 Estimation Guarantees for Robust Kalman Filtering

To address the possible poor performance of the CE Kalman filter when model uncertainty is large, we propose to search over dynamic filters (3) subject to additional robustness constraints on their transient response. Using the System Level Synthesis (SLS) framework (Wang et al., 2019; Anderson et al., 2019) for Kalman Filtering (Wang et al., 2015), we parameterize the class of dynamic filters (3) subject to additional robustness constraints in a way that leads to convex optimization problems.

For a given dynamic predictor , define the closed loop system responses:

 Φw(z)≜(zI−^A+L^C)−1,Φv(z)≜−(zI−^A+L^C)−1L. (6)

In (Wang et al., 2015), it is shown that these responses are in fact the closed loop maps from process and sensor noise to state estimation error, and that the filter gain achieving the desired behavior can be recovered via so long as the responses are constrained to lie in an affine space defined by the system dynamics. By expressing the mean squared prediction error of the filters (3) in terms of their system responses, we are able to clearly delineate the effects of parametric uncertainty from the cost of deviating from the CE Kalman filter. [Error analysis] Consider system (2). Let , , . Any filter (3) with parameterization (6) has mean squared prediction error given by

Based on the previous lemma, we can upper bound the mean squared prediction error of filters (3) by

 ~J≤

where . This upper bound clearly separates the effects of parameter uncertainty, as captured by the first term, and the performance cost incurred by the filter due to its deviation from the CE Kalman gain , as captured by the second. In order to optimally tradeoff between these two terms, we propose the following robust SLS optimization problem:

 minΦw,Φv∥∥Φw^K+Φv∥∥H2 (7) s.t.∥[Φw Φv]∥H2≤C Φw(zI−^A)−Φv^C=I,Φw,Φv∈1zRH∞,

where the constant is a regularization parameter, and the affine constraint parameterizes all filters of the form (3) that have bounded mean squared prediction error (see Appendix or Wang et al. (2015) for more details). As we formalize in the following theorem, for appropriately selected regularization parameter and sufficiently accurate estimation errors , the robust KF has near optimal mean square estimation error. [Robust Kalman Filter] Consider Problem 2 with Kalman filters from class (3) synthesized using the robust SLS optimization problem (7). If the regularization parameter is chosen such that , and further, the estimation errors are such that

 (ϵA+ϵC∥K∥2)∥RA−KC∥H∞≤1/2 (8)

then the robust SLS optimization problem is feasible, and the synthesized robust Kalman filter has mean squared prediction error upper-bounded by

 ~J≤√3Cϵ∥∥∥[RAKI]∥∥∥H∞∥R1/2∥2+2ϵ∥RA−KC∥H2∥R1/2∥2, (9)

where . We further note that whenever the system responses induced by the CE Kalman filter , are a feasible solution to optimization problem (7), they are also optimal, resulting in a filter with performance identical to the CE setting.

## 5 End-to-End Sample Complexity for the Kalman Filter

Theorems 5 and 7 provide two different solutions to Problem 2. Combining these theorems with with the finite data system identification guarantees of Tsiamis and Pappas (2019), we now derive, to the best of our knowledge, the first end-to-end sample complexity bounds for the Kalman filtering of an unknown system. For both the CE and robust Kalman filter, we show that the mean squared estimation error defined in (3) decreases with rate up to logarithmic terms, where is the number of samples collected during the system identification step. The formal statement of the following theorem which addresses Problem 1 can be found in Theorem E.

[End-to-end guarantees, informal] Fix a failure probability , and assume that we are given a sample trajectory generated by system (2). Then as long as , we have with probability at least that the identification and filter synthesis pipeline of Fig. 1, with system identification performed as in Tsiamis and Pappas (2019) and filter synthesis performed as in Sections 34, achieves mean squared prediction error satisfying

 ~J≤CIDCKF~O⎛⎝√log(1/δ)N⎞⎠, where CKF=11−ρ(A−KC)11−ρ(A)

and captures the difficulty of identifying system (2) (see (31) in the Appendix). Here, hides constants, other system parameters, and logarithmic terms.

The bound derived in Theorem 5 highlights an interesting tension between how easy it is to identify the unknown system, and the robustness of the underlying optimal Kalman filter. The constant captures how robust the underlying open loop system and closed loop Kalman filter are, as measured by their spectral gaps and . In particular, we expect to be small for systems that admit optimal KFs with favorable robustness and transient performance. In contrast, the constant captures how easy it is to identify a system: recent results for the fully observed setting (Simchowitz et al., 2018; Sarkar and Rakhlin, 2018) suggest that systems with larger spectral radius are in fact easier to identify, as they provide more “signal” to the identification algorithm. In this way, our upper bound suggests that systems which properly balance between these two properties, robust transient performance and ease of identification, enjoy favorable sample complexity.

We also note that the degradation of our bound with the inverse of the spectral gap appears to be a limitation of the proposed offline two step architecture – indeed, Lemma 6 suggests that any estimation error in the state-space parameters causes an increase in mean squared prediction error as increases. It remains open as to whether other prediction architectures would suffer from the same limitation.

## 6 Simulations

We perform Monte Carlo simulations of the proposed pipeline for the system

for varying sample lengths . We simulate both the CE and robust Kalman filters, and set the regularization parameter to in the robust SLS optimization problem (7). For each iteration, we first simulate system (2) to obtain output samples. Then, we perform system identification to obtain the system parameters, after which we synthesize both CE and robust Kalman filters. Finally, we compute the mean prediction error of the designed filters.

For the identification scheme, we used the variation of the MOESP algorithm Qin (2006), which is more sample efficient in practice than the one analyzed in Tsiamis and Pappas (2019)–see Algorithm 1 and Section D.2. The basis of the state-space representation returned by the subspace algorithm is data-dependent and varies with each simulation. For this reason, to compare the performance across different simulations, we compute the mean square error in terms of the original state space basis. Note that the SLS optimization problem (7) is semi-infinite since we optimize over the infinite variables and . To deal with this issue, we optimize over a finite horizon –see for example Dean et al. (2018), which makes the problem finite and tractable. Here, we selected .

Figure 2 (a) and (b) show the empirically computed mean squared prediction errors of the CE and Robust Kalman filters, with the mean, 95th, and 97.5th percentiles being shown. Notice that both errors decrease with a rate of , and that while the average behavior of both filters is quite similar, there is a noticeable gap in their tail behaviors. We observe that the most significant gap between the CE and Robust Kalman filters occurs when the eigenvalues of the CE matrix are close to the unit circle. Fig. 3 shows the empirical distribution of mean squared prediction errors conditioned on the event that . In this case, the CE filter can exhibit extremely poor mean squared prediction error, with the worst observed error (not shown in Fig. 3 in the interst of space) approximately equal to 70 – in contrast, the worst error exhibited by the robust Kalman filter was approximately equal to 5. Thus, we were able to achieve a 14x reduction in worst-case mean squared error. For some simulations the robust KF can exhibit worse performance compared to the CE Kalman filter. However, over all simulations, the mean squared error achieved by the robust Kalman filter was at most 1.64x greater than that achieved by CE Kalman filter.

## 7 Conclusions & Future work

In this paper, we proposed and analyzed a system identification and filter synthesis pipeline. Leveraging contemporary finite data guarantees from system identification (Tsiamis and Pappas, 2019), as well as novel parameterizations of robust Kalman filters (Wang et al., 2015), we provided, to the best of our knowledge, the first end-to-end sample complexity bounds for the Kalman filtering of an unknown autonomous LTI system. Our analysis revealed that, depending on the spectral properties of the CE Kalman filter, a robust Kalman filter approach may lead to improved performance. In future work, we would like to explore how to improve robustness and performance by further exploiting information about system uncertainty, as well as how to integrate our results into an optimal control framework, such as Linear Quadratic Gaussian control.

## Appendix A Properties of the CE Kalman Filter

The following result, which follows from the theory of non-stabilizable Riccati equations Chan et al. (1984), describes the form of the certainty equivalent gain. Consider the assumptions of Problem 2. Assume that is observable and is positive definite. The CE Kalman filter gain  (5) has the following properties:

• [wide, labelwidth=!, labelindent=0pt]

• If , then and is asymptotically stable.

• If , and has no eigenvalues on the unit circle, then is asymptotically stable.

• If has eigenvalues on the unit circle, then (4) does not admit a stabilizing solution.

After some algebraic manipulations–see also Kailath et al. (2000), the Riccati equation (4) can be rewritten as:

 P=(^A−^K^C)P(^A−^K^C)∗−(^A−^K^C)P^C∗(^CP^C∗+^R)−1^CP(^A−^K^C)∗

Notice that there is no term in the equivalent algebraic Riccati equation. If is already stable then the trivial solution is the stabilizing one. If is not asymptotically stable the results follow from Theorem 3.1 of Chan et al. (1984).

## Appendix B SLS preliminaries

For this subsection, we assume that . Using bold symbols to denote the frequency representation of signals, we can rewrite the original system equation (2) and the predictor equation (3) as:

 (zI−A+KC)x=Ky,(zI−A+LC)~x=Ly.

Subtracting the two equations and using the fact that , we obtain:

 x−~x=(zI−A+LC)−1Ke−(zI−A+LC)−1Le

Define the responses to and by and respectively. Then the error obtains the linear representation:

 x−~x=(ΦwK+Φv)e

The case of , , can be found in Lemma 6. The following result from Wang et al. (2015) parameterizes the set of stable closed-loop transfer matrices . [Predictor parameterization] Consider system (2). Let denote the set of real rational stable strictly proper transfer matrices. The closed-loop responses from and to can be induced by an internally stable predictor if and only if they belong to the following affine subspace:

 [ΦwΦv][zI−A−C]=I,Φw,Φv∈1zRH∞. (10)

Given the responses, we can parameterize the prediction gain as . Let and . The strictly proper condition enforces the constraint The affine constraints simply imply that the system responses should satisfy the linear system recursions:

 Φw,t+1=Φw,tA+Φv,tC,t≥1,Φw,1=I

Assuming that the predictor is internally stable, then the mean square error is equal to

 ~J=∥(ΦwK+Φv)R1/2∥H2,

where is the system norm. Hence, the error-free Kalman filter synthesis problem could be re-written as:

 minΦw,Φv∥(ΦwK+Φv)R1/2∥H2,s.t. (???)

Of course, when the model knowledge is perfect, the solution to this problem is trivially , , , .

## Appendix C Proofs

### Proof of Theorem 5

Let . By adding and subtracting , we obtain the bound:

 ∥ΔAcl∥≤ϵA+∥^K∥2ϵC+ϵK∥C∥2≤ϵA+(∥K∥2+ϵK)ϵC+ϵK∥C∥2

Hence, from the robustness condition of the theorem it follows that

 2τ(A−KC,ρ)∥ΔAcl∥2≤1−ρ (11)

Now, from Lemma 5 in Mania et al. (2019) it follows that:

 ∥(^A−^K^C)k∥2=∥(A−KC−ΔAcl)k∥2≤τ(A−KC,ρ)(τ(A−KC,ρ)∥∥ΔAcl∥∥2+ρ)k (12)

Combining (11), (12), we finally obtain:

 ∥(^A−^K^C)k∥2≤τ(A−KC,ρ)(1+ρ2)k.

Thus, the norm of is upper bounded by

 ∥∥R^A−^K^C∥∥H∞ ≤∞∑t=0∥(^A−^K^C)t∥2 ≤τ(A−KC,ρ)∞∑k=0(1+ρ2)k=2τ(A−KC,ρ)1−ρ

This further implies

 ∥∥[R^A−^K^C−R^A−^K^C^K]∥∥H∞ ≤(1+∥K∥2+ϵK)∥∥R^A−^K^C∥∥H∞ ≤(1+∥K∥2+ϵK)2τ(A−KC,ρ)1−ρ.

Now let and . The proof follows from Lemma 6 and the inequality

 ≤∥[ΦwΦv]∥H∞∥∥∥[ΔAΔKΔC0][RAKI]R1/2∥∥∥H2 ≤√3ϵ(1+∥K∥2+ϵK)2τ(A−KC,ρ)1−ρ∥∥∥[RAKI]R1/2∥∥∥H2

### Proof of Lemma 6

It is sufficient to show that

 x−~x={(ΦwΔA+ΦvΔC)RAK+ΦwΔK+Φw^K+Φv}e,

then the result follows from the definition of norm and the fact that

is white noise with unit variance.

In frequency domain, equations (

2), (3) can be rewritten as

 (zI−^A)~x=L(y−^C~x),(zI−A)x=K(y−Cx)

Subtracting the two equations yields:

 (zI−^A+L^C)(x−~x)+(−ΔA−L^C+KC)x=(K−L)y

Using the fact that , we obtain:

 (zI−^A+L^C)(x−~x)=(ΔA−L[C−^C])x+(K−L)e.

Multiplying from the left by and using the fact that

 x−~x=(ΦwΔA+ΦvΔC)x+(ΦwK+Φv)e

The result follows from adding and subtracting and the fact that .

### Proof of Theorem 7

Step a: First we prove that when optimization problem (7) is feasible, the the mean square error is bounded by:

 ~J≤√3Cϵ∥∥∥[RAKI]∥∥∥H∞∥R1/2∥2+opt(C)∥R1/2∥2. (13)

Assume that is an optimal solution to (7). From Lemma 6:

 ~J≤ ≤ √3Cϵ∥∥∥[RAKI]∥∥∥H∞∥R1/2∥2+opt(C)∥R1/2∥2,

where we used and optimality of .

Step b: We prove that under condition (8), the static Kalman gain is a feasible gain for (7); equivalently, the responses , and satisfy the constraints of (7). Consider the responses and , which are optimal for the original unknown system. They satisfy the affine relation for the original system:

 [Φw,optΦv,opt][zI−A−C]=I

Adding and subtracting the estimated matrices, we can show that they also satisfy a perturbed affine relation for the estimated system:

 [Φw,optΦv,opt][zI−^A−^C]=I+(Φw,optδA+Φv,optδC)Δ

If the perturbation is stable, we can multiply both sides from the left, which yields:

where we used the fact that:

 (I+Δ)−1Φw,opt=~Φw,(I+Δ)−1Φv,opt=~Φv

Under condition (8), the perturbation has norm bounded by:

 ∥Δ∥H∞≤(ϵA+ϵC∥K∥2)∥RA−KC∥H∞≤1/2

Hence:

 ∥∥(I+Δ)−1∥∥H∞≤∞∑t=0∥Δ∥tH∞≤11−∥Δ∥H∞=2

which shows that the responses are stable. By construction, they are also strictly proper. What remains to show is that the robustness constraint holds. We have:

 ∥∥[~Φw~Φv]∥∥H2 ≤∥∥(I+Δ)−1[ΦwΦv]∥∥H2 ≤∥∥(I+Δ)−1∥∥H∞(1+∥K∥2)∥RA−KC∥H2 ≤2(1+∥K∥2)∥RA−KC∥H2≤C

Step c: Since is a feasible gain, by suboptimality

 opt(C) ≤∥∥~Φw^K+~Φv∥∥H2≤∥∥(I+Δ)−1∥∥H∞∥∥Φw^K+Φv∥∥H2 ≤2ϵ∥RA−KC∥H2

where we used .

## Appendix D Identification algorithm and analysis

Here we briefly present the results from Tsiamis and Pappas (2019). The stochastic identification algorithm involves two steps. First, we regress future outputs to past outputs to obtain a Hankel-like matrix, which is a product of an observability and a controllability matrix. Second, we perform a realization step, similar to the Ho-Kalman algorithm, to obtain estimates for . The outline can be found in Algorithm 1

Definitions. Let , with be two design parameters that define the horizons of the past and the future respectively. Assume that we are given output samples. We define the future outputs and past outputs at time as follows:

 Y+k ≜⎡⎢ ⎢⎣yk⋮yk+f−1⎤⎥ ⎥⎦,Y−k≜⎡⎢ ⎢⎣yk−p⋮yk−1⎤⎥ ⎥⎦,k≥p

The past and future noises are defined similarly:

 E+k ≜⎡⎢ ⎢⎣ek⋮ek+f−1⎤⎥ ⎥⎦,E−k≜⎡⎢ ⎢⎣ek−p⋮ek−1⎤⎥ ⎥⎦,k≥p

The (extended) observability matrix and the reversed (extended) controllability matrix are defined as:

 Ok≜[C∗A∗C∗⋯(A∗)k−1C∗]∗, (14)
 Kk≜[(A−KC)k−1K…(A−KC)KK] (15)

respectively. We define the Hankel matrix:

 G≜OfKp. (16)

Finally, for any , define block-Toeplitz matrix:

 Ts≜⎡⎢ ⎢ ⎢ ⎢⎣Im00CKIm⋯0⋮⋮⋮CAs