DeepAI

# Online greedy identification of linear dynamical systems

This work addresses the problem of exploration in an unknown environment. For linear dynamical systems, we use an experimental design framework and introduce an online greedy policy where the control maximizes the information of the next step. In a setting with a limited number of experimental trials, our algorithm has low complexity and shows experimentally competitive performances compared to more elaborate gradient-based methods.

• 1 publication
• 25 publications
05/20/2020

### The AAA framework for modeling linear dynamical systems with quadratic output

We consider linear dynamical systems with quadratic output. We first def...
03/25/2020

### Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

We study the problem of adaptive control in partially observable linear ...
02/19/2021

### Deluca – A Differentiable Control Library: Environments, Methods, and Benchmarking

We present an open-source library of natively differentiable physics and...
08/03/2018

### Structured Neural Network Dynamics for Model-based Control

We present a structured neural network architecture that is inspired by ...
02/10/2021

### Task-Optimal Exploration in Linear Dynamical Systems

Exploration in unknown environments is a fundamental problem in reinforc...
02/23/2021

### Online Stochastic Gradient Descent Learns Linear Dynamical Systems from A Single Trajectory

This work investigates the problem of estimating the weight matrices of ...
07/06/2021

### Identification of Dynamical Systems using Symbolic Regression

We describe a method for the identification of models for dynamical syst...

## 1 Introduction

System identification is a problem of great interest in many fields such as econometrics, robotics, aeronautics, mechanical engineering or reinforcement learning

[12, 17, 5, 6, 16]

. The task consists in estimating the parameters of an unknown system by sampling trajectories from it as fast as possible. To this end, inputs must be chosen so as to yield maximally informative trajectories. We focus on linear time-invariant (LTI) systems. Let

and  be two matrices; we consider the following discrete-time dynamics:

 x0 =0, (1) xt+1 =Axt+But+wt,0≤t≤T−1

where is the state,

is a normally distributed isotropic noise with known variance

and the control variables  are chosen by the controller with the following power constraint:

 1TT−1∑t=0∥ut∥2≤γ2. (2)

The system parameters () are unknown initially and are to be estimated from observed trajectories . The goal of system identification is to choose the inputs so as to drive the system to the most informative states for the estimation of . It may happen that the controller knows , in which case and .

System identification is a primary field in control theory. It has been widely studied in the field of optimal design of experiments [2, 11]. For LTI dynamic systems, classical optimal design approaches provided results for single-input single-output (SISO) systems [5, 10]

or multi-input multi-output (MIMO) systems in the frequency domain or with randomized time-domain inputs

[15]

. More recently, system identification received considerable attention in the machine learning community, with the aim of obtaining finite-time bounds on the estimation error for

[8, 9, 20]. In [24] and [25], the inputs are optimized in the frequency domain to maximize an optimal design objective, with theoretical estimation rate guarantees. In our approach, we directly optimize deterministic inputs in the time domain for MIMO LTI systems. An important aspect of system identification is the quantity of computational resource and the number of observations needed to reach a certain performance. We study the computational complexity of our algorithms and compare their performance against each other and against an oracle, both on average and on real-life dynamic systems.

### 1.1 Notations

In the rest of this work, we note the unknown parameter underlying the dynamics. We suppose that the pair is controllable: the matrix  has rank . Adopting the notations of [25], we define a policy as a mapping from the past trajectory to future input. The set of policies meeting the power constraint (2) is noted . We note a trajectory, and we extend this notation to when the trajectory is obtained using a policy up to time . We denote by the average for a dynamical system given by (1) (where the randomness comes from the noise and possibly from the policy inducing the control ).

Fix an estimator , yielding an estimate of the parameters from a given trajectory. Our objective is to play the inputs of a policy so that the resulting trajectory gives a good estimation for . We measure this performance by the mean squared error:

 MSE(π)=12Eθ⋆[∥∥^θ(τ(π,T))−θ⋆∥∥2F]. (3)

Of course, this quantity depends on the true parameter of the system which is unknown. A natural way of proceeding is to estimate sequentially, as follows.

###### Definition 1 (Adaptive system identification).

Given an estimate of , the policy for the next sequence of inputs can be chosen so as to minimize a cost function approximating the MSE (3), using as an approximation of . Then, these inputs are played and is re-estimated with the resulting trajectory, and so on. We call planning the process of minimizing .

This approach is summarized in Algorithm 1, which takes as inputs a first guess for the parameters to estimate and a policy , the problem parameters and , a schedule , a cost functional  and an estimator .

An adaptive identification algorithm is hence determined by a triplet . A natural estimator is the least squares estimator which we define in Section 2.1. In the rest of this work, we set .

###### Example 1 (Random policy).

A naive strategy for system identification consists in playing random inputs with maximal energy at each time step. This corresponds to the choice  and returning .

###### Example 2 (Task-optimal pure exploration).

In [25], the authors propose the following cost function

 (4)

where is defined in equation (8) below. As we will see in Section 2.2, this corresponds to A-optimal experimental design. The authors show that this cost function approximates the MSE in the long time limit at an optimal rate when . In their identification algorithm, they set

for some initial epoch

.

###### Example 3 (Oracle).

An oracle is a controller who is assumed to choose their policy with the knowledge of the true parameter . It can hence perform one single, offline optimization of over . By definition, the inputs played by the oracle are the optimal inputs for our problem of mean squared error system identification.

### 1.3 Contributions

In practice, systems have complex dynamics and can only be approximated locally by linear systems. We still believe that in order to understand complex systems, we need to understand identification of linear systems as on short time scales, we can approximate the complex system with a linear one. In order to be practical, our identification algorithm needs to interact as little as possible with the true system and to take decisions as fast as possible. With previous notations, we are interested in cases where is small (to ensure that in practice the dynamics remains time-invariant and linear) and where the estimation and planning steps need to be very fast in order to run the algorithm online.

In this work, we explore a setting for linear system identification with hard constraints on the number of interactions with the real system and on the computing resources used for planning and estimation. To the best of our knowledge, finite-time system identification guarantees are only available in the large limit which makes the hypothesis of linear dynamic quite unlikely. Using a framework based on experimental design, we propose a greedy online algorithm requiring minimal computing resources. The resulting policy gives a control that maximizes the amount of information collected at the next step. We show empirically that for short interactions with the system, this simple approach can actually outperforms more sophisticated gradient-based methods. We also propose a method to compute an oracle optimal control, against which we can compare the different identification algorithms.

### 1.4 Related work

System identification has been studied extensively in the last decades [3, 12]. The question of choosing the maximally informative input can be tackled in the framework of classical experimental design [2, 18]. Several methods have been proposed for the particular case of dynamic systems [14, 10, 15] A comprehensive study can be found in [5], with a focus on SISO systems.

In the machine learning community, the last few years have seen an increasing interest in finite-time system identification [20, 9, 19, 22]

. These works typically derive theoretical error rates for linear dynamic system estimation and produce high probability bounds guaranteeing that the estimation is smaller than

with probability greater than  after a certain number of samples. The question of designing optimal inputs is tackled in [24, 25]. The authors derive an asymptotically optimal algorithm by computing the control in the frequency domain. In [13], an approach to control partially nonlinear systems is proposed.

## 2 Background

It is convenient to describe the structure of the state as a function of the inputs and the noise. By integrating the dynamics (1), we obtain the following result.

###### Proposition 1.

The state can be expressed as with

 ¯xt=t−1∑s=0At−1−sBus,~xt=t−1∑s=0At−1−sws. (5)

Note that that  solves the deterministic dynamics  and has zero mean and is independent of the control. The two terms and depend linearly on the and the respectively.

The data-generating distribution knowing the parameter

can be computed using the probability chain rule with the dynamics (

1):

 p(τ|θ)=1√2πσ2exp[−12σ2T−1∑t=0∥xt+1−Axt−But∥22]. (6)

We define the log-likelihood (up to a constant):

 ℓ(τ,θ) =−12σ2T−1∑t=0∥xt+1−Axt−But∥22 (7) =−12σ2∥Y−Zθ⊤∥2F,

where we have noted and the observations and the covariates associated to the parameter . If , then , . If , then and . We also note the input matrix and

the state matrix. We define the moment matrix

and the Gramians of the system at time :

 Γt(τ;θ)=1tEθ[Mt−1] (8)

and . Note that

### 2.1 Ordinary least squares

Given a trajectory, a natural estimator for the matrix is the least squares estimator. The theory of least squares provides us with a formula for the mean squared error with respect to the ground truth, which can be used as a measure of the quality of a control.

###### Proposition 2 (Ordinary least squares estimator).

Given inputs and noise

, the ordinary least squares (OLS) estimator associated to the resulting trajectory

is

 ^θ(τ)=((Z⊤Z)−1Z⊤Y)⊤. (9)

and its difference to is given by

 (^θ(τ)−θ⋆)⊤ =(Z⊤Z)−1Z⊤W (10) =Z+W,

where denotes the pseudo-inverse of . Noting the least squares estimator obtained from the trajectory up to time , we recall the recursive update formula

 θt+1⊤=M−1t+1(Mtθt+ztyt⊤). (11)
###### Proof.

The least squares estimator minimizes the quadratic loss , which writes

 12∥∥Y−Zθ⊤∥∥2F=12d∑j=1∥∥Yj−Zθj∥∥22 (12)

with the -th column of and the -th row of . The terms of the sum can be minimized independently, with each minimizing the least squares of the vectorial relation . The solution for is equal to (see e.g. [1]). By concatenating the columns, we obtain that , which proves (9). Substituting  yields (10). Note here that the controllability assumption on ensures that can be made full rank, and hence that the moment matrix is invertible. ∎

###### Definition 2 (OLS mean squared error).

For a given trajectory generated with a matrix and noise , the Euclidean mean squared error (MSE) is

 ∥^θLS−θ⋆∥2F =∥∥((Z⊤Z)−1Z⊤W)⊤∥∥22 (13) =tr[Z(Z⊤Z)−2Z⊤WW⊤].

If the noise and the covariates were independent, then the expected error would reduce to the A-optimal design objective . It is not the case in our framework since is generated with .

### 2.2 Classical optimal design

The correlation between and makes the derivation of a tractable expression for the expectation of (13) complicated. In this section, we show how a more tractable objective can be computed by applying theory of optimal experimental design [2, 21]. In the classical theory of optimal design, the informativeness of an experiment is measured by the size of the expected Fisher information.

###### Definition 3 (Fisher information matrix).

Let denote the log-likelihood of the data-generating distribution knowing the parameter . The Fisher information matrix is defined as

 I(θ)=−Eθ[∂2ℓ(τ,θ)∂θ2]∈Rqd×qd. (14)
###### Proposition 3.

For the LTI system (1),

 I(θ)=Tσ2diag(ΓT,…,ΓT), (15)

the number of blocks being . Furthermore, can be computed as

 ΓT=1TT−1∑t=0¯zt¯zt⊤+σ2Gt(A). (16)
###### Proof.

The log-likelihood (7) can be separated into a sum over the as in (12). The quadratic term in is and the other terms are constant or linear. Differentiating twice and taking the expectation gives , which yields the desired result after dividing by . Following the decomposition (5), Taking the expectation, we obtain . Summing over yields the result. Note that the first term is deterministic and depends on the control whereas the second term depends on the noise and not on the control. Therefore, the expected moment matrix is the sum of a noise term and of a deterministic control term. ∎

###### Definition 4.

In classical optimal design, the size of the information matrix is measured by some criterion

, which is a functional of its eigenvalues

. The quantity represents the amount of information brought by the experiment and should be maximized.

###### Example 4.

Some of the usual criteria are presented in Table 1.

The criteria are required to have properties such as homogeneity, monotonicity and concavity in the sense of the Loewner ordering, which can be interpreted in terms of information theory: monotonicity means that a larger information matrix brings a greater amount of information, concavity means that information cannot be increased by interpolation between experiments. We refer to

[18] for more details.

The theory of classical optimal design leads to the definition of the following optimal design informativeness functional.

###### Definition 5 (Optimal design functional).

Let denote an optimal design criterion. Then the associated cost is defined as

 FΦ(π;θ,t) =−Φ[Γt(τ(π);θ)] (17) =−Φ[t−1∑s=0¯zs¯zs⊤+σ2Gs(A)],

where the depend on the inputs through (5).

###### Remark 1.

We note from equation (5) that is affine in . Hence, is quadratic in , and maximizing (17) efficiently is challenging even with concavity assumptions on .

### 2.3 Small noise regime

The optimal design functional (17) can be related to the MSE in the small noise regime .

###### Proposition 4.

The A-optimal design functional (17) is a  approximation of the MSE (3):

 MSE(π)=12FA(π;θ⋆,T)+O(σ/γ). (18)
###### Proof.

We introduce the rescaled variables and which are of order . Extending the notations of equation (5), , where the first term is of order and the second is of order . Therefore, , or equivalently . By Proposition 7 is differentiable at so . Taking the squared norm and using Cauchy-Schwartz inequality, we obtain

 ∥∥ζ+ω∥∥2 =∥∥¯ζ+ω∥∥2+O(σ/γ). (19)

Furthermore,

 E[∥∥¯ζ+ω∥∥2] =E[tr(¯ζ(¯ζ⊤¯ζ)−2¯ζ⊤ωω⊤)] (20) =tr[(¯ζ⊤¯ζ)−1].

Gathering (19) and (20), we obtain

 12E[∥∥ζ+ω∥∥2]=12tr[(¯ζ⊤¯ζ)−1]+O(σ/γ). (21)

###### Remark 2.

In classical least squares regression, the covariates  are independent of the noise . As a consequence, the minimziation of the mean squared estimation error leads to the classical A-optimality criterion. This does not hold in general in our framework because the signal and the noise are coupled by the dynamics (1). However, Proposition 18 shows that this criterion does hold in the small noise regime at first order in . Indeed, when  the contribution of the noise to the signal is negligible because the deterministic part of the signal is of order .

###### Remark 3.

From Proposition 4 and the definition of A-optimality, we see that the MSE approximately scales like  when the number of observations increases. This is confirmed by experiments.

## 3 Online greedy identification

A simple, natural approach for system identification consists in choosing a decision sequentially at each time step. At each time , the control is chosen with energy so as to maximize a one-step-ahead objective. Then, a new observation is collected and the process repeats. Following Section 2.2, can be chosen to maximize the value of at . This corresponds to the choice of functional and to the one-step schedule .

Upon choosing , the policy should select so as to maximize the design criterion  applied on the one-step ahead, -dependent information matrix, the past trajectory being fixed. The one-step-ahead information matrix is , with when  is estimated (because then then next -dependent covariate is ) and  if is known, because then the next -dependent covariate is . Therefore, one-step ahead planning yields the following optimization problem:

 maxu∈Rm Φ(¯Mt+z(u)z(u)⊤) (22) such that ∥u∥2≤γ2,

with

 ¯Mt={Mt−1+σ2Gt(At)ifθ=(A,B)Mt+σ2Gt+1(At)ifθ=A, (23)

and

 z(u)=⎧⎪⎨⎪⎩(xtu)ifθ=(A,B)Atxt+B⋆utifθ=A. (24)
###### Remark 4.

With this greedy policy, the energy constraint imposed for one input ensures that the global power constraint (2) is met.

The corresponding identification process is detailed in Algorithm 2. We will see in Section 3.2 that problem (22) can be solved accurately and at a cheap cost. Moreover, Algorithm  2 offers the advantage of improving the knowledge of  at each time step using all the available information on the parameter to plan at each time step. This way, the bias affecting planning due to the uncertainty about is minimized. When planning is performed over larger time sequences, a large bias could impair the identification of the system.

### 3.2 Solving the one-step optimal design problem

We show that the one-step ahead planning for online system identification is equivalent to a convex quadratic program which can be solved efficiently.

###### Proposition 5.

For D-optimality and A-optimality, there exists a symmetric matrix and the problem (22) is equivalent to

 minu∈Rd u⊤Qu−2b⊤u (25) such that ∥u∥2≤γ2.
###### Proof.

From Proposition 8, we find that

 logdet(¯Mt+z(u)z(u)⊤) =logdet¯Mt (26) +log(1+z(u)⊤¯Mt−1z(u)).

Similarly, from Corollary 1 ,

 −tr[(¯Mt+z(u)z(u)⊤)−1] =1−tr[¯Mt−1] (27) −11+z(u)⊤¯Mt−1z(u).

Maximizing these quantities with respect to amounts to maximizing . The matrix is symmetric because the  and the  are symmetric, and so are its diagonal submatrices. Given the affine dependence of in and the (possible) block structure of and  is of the form , up to a constant. We provide an explicit formula for and in the case where in Remark 5

We now characterize the minimizers of Problem (25). If a minimizer can be found in the interior of the constraining sphere, then is positive semidefinite and the problem can be tackled using unconstrainted optimization. We thus consider the equality constrained problem

 minu∈Rd u⊤Qu−2b⊤u (28) such that ∥u∥2=γ2.
###### Proposition 6.

Note the eigenvalues of , and  and  the coordinates of  and  in a corresponding orthonormal basis. Then a minimizer satisfies the following equations for some nonzero scalar :

 ui=bi/(αi+μ)and∑ibi2(αi+μ)2=γ2. (29)
###### Proof.

By the Lagrange multiplier theorem there exists a nonzero scalar  such that , where can be scaled such that is nonsingular. Inverting the optimal condition and expanding the equality constraint gives the two conditions. ∎

Problem (25) can hence be solved at the cost of a scalar root-finding and an eigenvalue decomposition. In [7], bounds are provided so as to initialize the root-finding method efficiently.

###### Remark 5.

In the case where (i.e. ), and have the following expressions:

 Q=−B⊤¯Mt−1B,b=B⊤¯Mt−1Atxt. (30)

In this section, we propose a gradient-based approach to planning. In a sequential identification scheme of Algorithm 1, the cost functions (3) and (17) can be optimized by projected gradient descent. This builds on the following remark.

###### Remark 6 (Differentiability of the functionals).

The functionals (3) and (17) are differentiable functions of the output. Indeed, is an affine function of the inputs as shown in Proposition 1, and the controllability of guarantees that is positive definite. Furthermore, the operations of pseudo-inverse (see Proposition 7) and the optimal design criteria of Table 1 are differentiable over the set of positive definite matrices.

The gradients with respect to can either be derived analytically (see [5], section 6 for the derivation of an adjoint equation) or automatically in an automatic differentiation framework. We rescale at each step to ensure the power constraint is met. The are chosen arbitrarily. The computational complexity of the algorithm is linear in

: each gradient step backpropagates through the planning time interval.

We propose a gradient-based method to optimize by performing gradient descent directly on in functional (17). Note that we optimize the inputs directly in the time domain, whereas other approaches such as [25] perform optimization in the frequency domain by restricting the control to periodic inputs.

### 4.2 Gradient through the oracle MSE

Given the true parameters , the optimal control for the MSE minimizes the MSE cost (3), as explained Example 3. However, the dependency between and makes this functional complicated to evaluate and to minimize with respect to the inputs, even when the true parameters are known. We propose a numerical method to minimize (3) using automatic differentiation an Monte-Carlo sampling. Given one realization of the noise and inputs , the gradient of the squared error (13) can be computed automatically in an automatic differentiation framework. Then, one can sample a batch of noise matrices  and approximate the gradient of (3) by

 ∇MSE(U)≃1bb∑i=1∇Utr[Z(Z⊤Z)−2Z⊤WiWi⊤]. (31)

Although we do not have convergence guarantees due to the lack of structure of the objective function, the gradient descent does converge in practice, to a control that outperforms the adaptive controls.

## 5 Performance study

### 5.1 Complexity analysis

###### Definition 6 (Performance).

Let denote the estimation produced by the learning algorithm at the end of identification. The performance of the policy is measured by the average error over the experiments on the true system: . We study the performance of our algorithms as a function of the number of observations  and the computational cost. We also introduce the computational rate .

Algorithm 2 and the gradient identification algorithm have linear time complexity. Hence, we define and for a given number of gradient iterations. In practice, we find that , where is the computational rate needed for the gradient descent to converge. As pointed out in Remark 3, the squared error essentially scales like . This is verified experimentally. Given the previous observations, we postulate that the performance of our algorithms takes the form

 ε(C,T)=η(c)/T. (32)

We build an experimental diagram where we plot the average estimation error for as a function of the two types of resource and for the gradient algorithm. Increasing  allows for more gradient steps. We run trials with random matrices of size , with . We set , , . The gradient algorithm optimizes the A-optimality functional (17) with a batch size of and . The obtained performances are to be compared with those of the greedy algorithm (with the A-optimality cost function), which has a fixed, small computational rate . Our diagrams are plotted on Fig. 1.

Our diagrams show that the greedy algorithm is preferable in a phase of low computational rate: , as suggested by (32). The phase separation corresponds to a relatively high number of gradient steps. Indeed, the iso-performance along this line are almost vertical, meaning that the gradient descent has almost converged. Furthermore, the maximum performance gain of the gradient algorithm relatively to the greedy algorithm is of 10%.

### 5.2 Average estimation error

We now test the performances of our algorithms on random matrices, with the same settings as in the previous experiment. For the gradient algorithm, the minimal number of gradient iterations to reach maximum performance for was found to be . For each matrix , we also compute an oracle optimal control using Algorithm 3 with a batch size of , and run a random input baseline (see Example 1), and the TOPLE algorithm of [25].

Both the gradient algorithm and the greedy algorithm closely approach the oracle. The former performs slightly better than the latter in average. However, the computational cost of the gradient algorithm is far larger, as Table 2 shows. Indeed, the number of gradient steps to reach convergence in this setting is found to be of order . Note that the number of sub-gradient steps for the TOPLE algorithm is found to be , and so .

### 5.3 Identification of an aircraft system

We now study a more realistic setting from the field of aeronautics: we apply system identification to an aircraft system. We use the numerical values issued in a report from the NASA [6]. The lateral motion of a Lockheed Jet star is described by the slideslip and roll angles and the roll and yaw rates . The control variables are the aileron and rudder angles . The linear dynamics for an aircraft flying at 573.7 meters/sec at 6.096 meters are given by the following matrix, obtained after discretization and normalization of the continuous-time system [6]:

 A⋆=⎛⎜ ⎜ ⎜⎝.955−.01130−.028401.05680−.250−.963.00496.1680−.00476−.993⎞⎟ ⎟ ⎟⎠, (33)
 B⋆=0.1×⎛⎜ ⎜ ⎜⎝00.0116001.62.7890−.87⎞⎟ ⎟ ⎟⎠, (34)

and , deg.

We apply our algorithms on this LTI system. Our results are summarized in Table 3.

As we can see, the greedy algorithm outperforms the gradient-based algorithms, both in performance and in computational cost. This could be explained by the fact that the signal-to-noise ratio in this system is of order 1, hence the estimation bias in planning is large and it is more effective to plan one-step-ahead than to do planning over large epochs. We obtain similar results for the longitudinal system of a C-8 Buffalo aircraft

[6].

## 6 Conclusion

In this work, we explore a setting for linear system identification with hard constraints on the number of interactions with the real system and on the computing resources used for planning and estimation. We introduce a greedy online algorithm requiring minimal computing resources and show empirically that for small values of interactions with the system, it can actually outperform more sophisticated gradient-based methods. Extension of this approach to optimal control for the LQR is an interesting direction of future research.

## 7 Matrix calculus

###### Proposition 7.

On a domain where has linearly independent columns, is a differentiable function of and

 dX+=−X+dXX++X+X+⊤dX⊤(I−XX+). (35)

See [4]. ∎

###### Lemma 1.

Let and . Then

 det(Ik,m+AB)=det(In,ℓ+BA). (36)
###### Proposition 8.

Let be a nonsingular matrix and . Then

 det(M+xy⊤)=detM×(1+y⊤M−1x). (37)
###### Proof.
 M+xy⊤=M(I+M−1xy⊤) (38)

Apply Lemma 1:

 det(M+xy⊤) =detM×det(Id+M−1xy⊤) (39) =detM×det(I1+y⊤M−1x) =detM×(1+y⊤M−1x).

See [23]. ∎

###### Proposition 9.

Let be positive definite matrices of , and . Then

 logdet(A+xx⊤)−logdetA≥logdet(B+xx⊤)−logdetB. (40)
###### Proof.

By Proposition 8,

 logdet(A+xx⊤)−logdetA=log(1+x⊤A−1x) (41)

Since , both matrices are nonsingular and . Hence,

 log(1+x⊤A−1x) ≥log(1+x⊤B−1x) (42) =logdet(B+xx⊤)−logdetB

Proposition 8 admits the following generalization.

###### Proposition 10.

Let be a nonsingular matrix and let . Then

See [23]. ∎

###### Proposition 11.

Let be a nonsingular matrix and . Then is nonsingular and

 (M+xy⊤)−1=(Id−11+x⊤M−1yxy⊤)M−1 (44)
###### Corollary 1.

Let be a nonsingular matrix and . Then

 tr[(M+xy⊤)−1]=tr[M−1]−y⊤M−1x1+x⊤M−1y (45)

## References

• [1] S. Boyd and L. Vandenberghe (2018)

Introduction to applied linear algebra: vectors, matrices, and least squares

.
Cambridge university press. Cited by: §2.1.
• [2] V.V. Fedorov, V.V. Fedorov, W.J. Studden, E.M. Klimko, and A. P. (Londyn). (1972) Theory of optimal experiments. Cellular Neurobiology, Academic Press. External Links: ISBN 9780122507502, LCCN 76182610, Link Cited by: §1.4, §1, §2.2.
• [3] M. Gevers, X. Bombois, R. Hildebrand, and G. Solari (2011) Optimal Experiment Design for Open and Closed-loop System Identification. Communications in Information and Systems (CIS) 11 (3), pp. 197–224. Note: Special Issue Dedicated to Brian Anderson on the Occasion of His 70th Birthday: Part II External Links: Link Cited by: §1.4.
• [4] G. H. Golub and V. Pereyra (1973) The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM Journal on numerical analysis 10 (2), pp. 413–432. Cited by: §7.
• [5] G.C. Goodwin and R.L. Payne (1977) Dynamic system identification: experiment design and data analysis. Developmental Psychology Series, Academic Press. External Links: ISBN 9780122897504, LCCN lc76050396, Link Cited by: §1.4, §1, §1, §4.
• [6] N. Gupta, R. Mehra, and W. Hall Jr (1976) Application of optimal input synthesis to aircraft parameter identification. Cited by: §1, §5.3, §5.3.
• [7] W. W. Hager (2001) Minimizing a quadratic over a sphere. SIAM Journal on Optimization 12 (1), pp. 188–208. Cited by: §3.2.
• [8] Y. Jedra and A. Proutiere (2020) Finite-time identification of stable linear systems optimality of the least-squares estimator. In 2020 59th IEEE Conference on Decision and Control (CDC), pp. 996–1001. Cited by: §1.
• [9] Y. Jedra and A. Proutiere (2020) Finite-time identification of stable linear systems: optimality of the least-squares estimator. External Links: 2003.07937 Cited by: §1.4, §1.
• [10] L. Keviczky (1975) "Design of experiments" for the identification of linear dynamic systems. Technometrics 17 (3), pp. 303–308. External Links: ISSN 00401706, Link Cited by: §1.4, §1.
• [11] K. Lindqvist and H. Hjalmarsson (2001) Identification for control: adaptive input design using convex optimization. In Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228), Vol. 5, pp. 4326–4331 vol.5. External Links: Document Cited by: §1.
• [12] L. Ljung (1998) System identification. In Signal analysis and prediction, pp. 163–173. Cited by: §1.4, §1.
• [13] H. Mania, M. I. Jordan, and B. Recht (2020) Active learning for nonlinear system identification with guarantees. External Links: 2006.10277 Cited by: §1.4.
• [14] R. Mehra (1974) Optimal inputs for linear system identification. IEEE Transactions on Automatic Control 19 (3), pp. 192–200. External Links: Document Cited by: §1.4.
• [15] R. K. Mehra (1976) Synthesis of optimal inputs for multiinput-multioutput (mimo) systems with process noise part i: frequenc y-domain synthesis part ii: time-domain synthesis. In System Identification Advances and Case Studies, R. K. Mehra and D. G. Lainiotis (Eds.), Mathematics in Science and Engineering, Vol. 126, pp. 211–249. External Links: Document, ISSN 0076-5392, Link Cited by: §1.4, §1.
• [16] T. M. Moerland, J. Broekens, and C. M. Jonker (2021) Model-based reinforcement learning: a survey. External Links: 2006.16712 Cited by: §1.
• [17] H.G. Natke (1992) System identification: torsten söderström and petre stoica. Automatica 28 (5), pp. 1069–1071. External Links: ISSN 0005-1098, Document, Link Cited by: §1.
• [18] F. Pukelsheim (2006) Optimal design of experiments. SIAM. Cited by: §1.4, §2.2.
• [19] T. Sarkar and A. Rakhlin (2019) Near optimal finite time identification of arbitrary linear dynamical systems. External Links: 1812.01251 Cited by: §1.4.
• [20] M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, and B. Recht (2018) Learning without mixing: towards a sharp analysis of linear system identification. External Links: 1802.08334 Cited by: §1.4, §1.
• [21] D. M. Steinberg and W. G. Hunter (1984) Experimental design: review and comment. Technometrics 26 (2), pp. 71–97. Cited by: §2.2.
• [22] A. Tsiamis and G. J. Pappas (2019) Finite sample analysis of stochastic system identification. External Links: 1903.09122 Cited by: §1.4.
• [23] R. Vrabel (2016) A note on the matrix determinant lemma. International Journal of Pure and Applied Mathematics 111 (4), pp. 643–646. Cited by: §7, §7.
• [24] A. Wagenmaker and K. Jamieson (2020) Active learning for identification of linear dynamical systems. External Links: 2002.00495 Cited by: §1.4, §1, Table 2, Table 3.
• [25] A. Wagenmaker, M. Simchowitz, and K. Jamieson (2021) Task-optimal exploration in linear dynamical systems. External Links: 2102.05214 Cited by: §1.1, §1.4, §1, §4.1, §5.2, Example 2.