# Finite Time Adaptive Stabilization of LQ Systems

Stabilization of linear systems with unknown dynamics is a canonical problem in adaptive control. Since the lack of knowledge of system parameters can cause it to become destabilized, an adaptive stabilization procedure is needed prior to regulation. Therefore, the adaptive stabilization needs to be completed in finite time. In order to achieve this goal, asymptotic approaches are not very helpful. There are only a few existing non-asymptotic results and a full treatment of the problem is not currently available. In this work, leveraging the novel method of random linear feedbacks, we establish high probability guarantees for finite time stabilization. Our results hold for remarkably general settings because we carefully choose a minimal set of assumptions. These include stabilizability of the underlying system and restricting the degree of heaviness of the noise distribution. To derive our results, we also introduce a number of new concepts and technical tools to address regularity and instability of the closed-loop matrix.

## Authors

• 7 publications
• 54 publications
• 31 publications
• ### Finite Time Analysis of Optimal Adaptive Policies for Linear-Quadratic Systems

We consider the classical problem of control of linear systems with quad...

• ### Resolution Limits of Non-Adaptive Querying for Noisy 20 Questions Estimation

We study fundamental limits of estimation accuracy for the noisy 20 ques...
04/15/2020 ∙ by Lin Zhou, et al. ∙ 0

• ### Finite Sample Analysis of Stochastic System Identification

In this paper, we analyze the finite sample complexity of stochastic sys...
03/21/2019 ∙ by Anastasios Tsiamis, et al. ∙ 26

• ### Resolution Limits of Noisy 20 Questions Estimation

We establish fundamental limits on estimation accuracy for the noisy 20 ...
09/27/2019 ∙ by Lin Zhou, et al. ∙ 0

Adaptive regulation of linear systems represents a canonical problem in ...

Recent progress in reinforcement learning has led to remarkable performa...
11/02/2020 ∙ by Feicheng Wang, et al. ∙ 0

• ### Active Learning for Accurate Estimation of Linear Models

We explore the sequential decision making problem where the goal is to e...
03/02/2017 ∙ by Carlos Riquelme, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

We consider finite time stabilization of the following linear system. Given the initial state , for we have

 x(t+1) = A0x(t)+B0u(t)+w(t+1), (1)

where at time

, the vector

corresponds to the state (and output) of the system, is the control action, and is a sequence of noise (i.e. random disturbance) vectors. The dynamics of the system, i.e. both the transition matrix , as well as the input matrix , are fixed but unknown.

In order to stabilize the system, we need to design an adaptive procedure and establish finite time theoretical guarantees of stabilization. Once the system has been stabilized, we can use an adaptive regulation policy to minimize the cost function determined by the application. Hence, the stabilization procedure needs to be completed in a relatively short time period. There is an extensive literature providing infinite time analyses to adaptively stabilize a Linear-Quadratic (LQ) system [1, 2, 3, 4, 5], whereas finite time results are scarce and rather incomplete. This work aims to contribute to this limited literature on the subject.

The evolution of LQ systems is governed by linear dynamics, while the operating cost is a quadratic function of the state and the control signal. To deal with the uncertainty about the true matrices guiding the system’s dynamics, a standard scheme is Certainty Equivalence (CE) [6]. Its prescription is to assume that the estimated parameters coincide with the true dynamics matrices. However, it is shown that inconsistency occurs with positive probability [7, 8, 9], which can lead to instability. This motivated modifying CE to Optimism in the Face of Uncertainty (OFU) [10, 11, 12, 13]. OFU prescribes to act as if an optimistic approximation of the true parameter is the one guiding the evolution of the system.

Recent finite time analyses consider a restricted setting [14, 15], where the proposed adaptive stabilization procedure heavily relies on the following strong conditions. First, controllability and observability of the true dynamics matrices of the system are assumed. Second, the closed-loop transition matrix is required to have operator norm less than one. Note that the former does not imply the latter [16]. Third, uncertainty about the true dynamics matrices is restricted to an a priori known bounded box in the space of matrices.

Finally, the noise vectors are supposed to have a sub-Gaussian distribution with uncorrelated coordinates.

We introduce our stabilization algorithm and establish finite time guarantees for it in Section IV. Leveraging the novel method of random linear feedbacks, we address the four aforementioned limitations. Indeed, the first assumption (which imposes a computationally intractable constraint [17]), as well as the third one (which requires possibly unavailable information) are not needed. Further, we relax the operator norm condition to the minimal assumption of stabilizability. Finally, the noise process is generalized to the remarkably larger class of heavy-tailed sub-Weibull distributions with possibly correlated coordinates. Note that unlike the operator norm, stability of matrices is not preserved by multiplication (i.e. the product of stable matrices can be unstable). This means that existing theoretical techniques [13, 14] used in addressing the stabilization problem fail to work when the operator norm is not less than one.

To derive finite time guarantees of stabilization, new concepts and technical tools are needed to address the following issues:

1. Because of the unbounded growth of the state vectors [18], the classical results of persistent excitation [19] are not applicable.

2. Since the system is not fully stabilized yet, the closed-loop matrix can have eigenvalues both inside and outside the unit circle. Thus, the smallest (largest) eigenvalue of the Gram matrix scales linearly (exponentially) with time

[20, 21]. This leads to the failure of the existing approaches which do not need the persistent excitation condition [14, 22].

3. For unstable systems, it is shown that the normalized empirical covariance of the state vector is a random matrix

[23, 24]. So, in order to obtain reliable identification results, anti-concentration properties of random matrices need to be carefully examined [25].

4. For accurate identification, one needs to ensure that the important condition of closed-loop regularity holds (see Definition 2). It is a necessary condition on the eigenvalues of magnitude larger than one [26].

The remainder of the paper is organized as follows. The problem is rigorously formulated in Section II. Then, in Section III we study the key identification results for unstable closed-loop dynamics as the cornerstone of the stabilization algorithm presented later on. Subsequently in Section IV, results regarding the properties of random linear feedback are established. Finally, we propose the adaptive stabilization Algorithm 1, and show that it is guaranteed to return a high probability stabilizing set.

### I-a Notation

The following notation is used throughout this paper. For matrix , is its transpose. When , the smallest (respectively largest) eigenvalue of (in magnitude) is denoted by (respectively ) and the trace of is denoted by . For , the -norm of vector is . Further, when , the norm is defined according to .

We also use the following notation for the operator norm of matrices. For , and , define

Whenever , we simply write . To denote the dimension of manifold over the field , we use . Finally, the sigma-field generated by random vectors is denoted by . The notations , and are defined in Remark 1, equations (3), (4), and Remark 2, respectively.

## Ii Problem Formulation

We start by discussing the adaptive stabilization problem that constitutes the primary focus of this work. As mentioned above, the corresponding adaptive policy for regulating the system (i.e. cost minimization) can be employed, once the stabilization is guaranteed. Results of this work can be used in the finite time analysis of adaptive regulation for LQ systems. Further, stabilization of linear systems is intimately related to a Riccati equation for the corresponding LQ system. Therefore, we comprehensively discuss the necessary preliminaries here.

The stochastic evolution of the system is governed by the linear dynamics (1), where are independent mean-zero noise vectors with full rank covariance matrix :

 E[w(t)]=0,E[w(t)w(t)′]=C,|λmin(C)|>0.

Generalizations of the established results to dependent noise vectors (i.e. martingale difference sequences) is rather straightforward. The true dynamics matrices are assumed to be stabilizable, as defined below.

###### Definition 1 (Stabilizability [16]).

is stabilizable if there exists such that . The linear feedback matrix is called a stabilizer for .

###### Remark 1.

For notational convenience, henceforth for , , we use to denote . Clearly, , where .

We assume perfect observations, i.e. the operator can fully observe the sequence of state vectors. Next, suppose that is the quadratic instantaneous cost function at time :

 ct = x(t)′Qx(t)+u(t)′Ru(t), (2)

which is defined according to the known positive definite cost matrices . An adaptive policy is a mapping which designs the control action according to the cost matrices, and the history of the system. That is, for all , the operator needs to determine according to , .

The following proposition shows that in order to stabilize a linear system, one can solve a Riccati equation. A solution, is a positive semidefinite matrix satisfying (3).

For this purpose, we introduce a notation that simplifies certain expressions throughout this work.

###### Remark 2.

For arbitrary stabilizable , let . So, .

###### Proposition 1.

If is stabilizable, (3) has a unique solution. Conversely, if (3) has a solution, defined by (4) is a stabilizer for the dynamics parameter ; i.e. .

The proof of Proposition 1 is provided in Appendix A, where the following cost minimization property of Riccati equations (3), (4) is established as well. Assuming the system evolves according to (1), the linear feedback minimizes the expected average cost of the system of dynamics parameter . Namely, letting be as (2), in general it holds that

 limsupT→∞1TT∑t=1E[ct]≥tr(K(θ0)C),

where the linear feedback attains the equality. An adaptive stabilization procedure is ignorant about the true parameter , and needs to estimate it. The following lemma addresses the stability if the actual system evolution parameter is , while the linear feedback is designed according to the approximation . The proof of Lemma 1 can be found in Appendix B.

###### Lemma 1 (Stabilizing neighborhood).

There is , such that for every stabilizable , if , then, is stable.

## Iii Closed-loop Identification

When applying linear feedback , the dynamics take the form , where is the unstable closed-loop transition matrix. Subsequently, we present results for the accurate identification of through the least-squares estimator. Observing the state vectors , for an arbitrary matrix

define the sum-of-squares loss function

. Then, the true closed-loop transition matrix is estimated by , which is a minimizer of the loss function; . To analyze the finite time behavior of the aforementioned identification procedure, the following is assumed for the tail-behavior of every coordinate of the noise vector.

###### Assumption 1 (Sub-Weibull distribution [21]).

There are positive reals , and , such that for all ,

 P(|wi(t)|>y)≤b1exp(−yαb2).

Intuitively, smaller values of the exponent correspond to heavier tails for the noise distribution, and vice versa. Note that whenever , the noise coordinates

do not need to have a moment generating function. Further, the noise coordinates can be either discrete or continuous random variables, and are not assumed to have a probability density function (pdf). Henceforth, the special case of bounded noise can be obtained from the presented results letting

.

Next, we define an important property of unstable transition matrices which is required in order to obtain accurate estimation results.

###### Definition 2 (Regularity [26]).

is regular if for any eigenvalue of such that , the geometric multiplicity of is one.

Regularity implies that the eigenspace corresponding to

is one dimensional, and vice versa. There are other equivalent formulations for regularity. Indeed, is regular if and only if for any eigenvalue such that , in the Jordan decomposition of there is only one block corresponding to , regardless of its algebraic multiplicity. Another equivalent formulation is that is regular, if and only if , for all , . For example, let be arbitrary invertible matrices, and assume

 D1=P−11[ρ10ρ]P1,D2=P−12[ρ00ρ]P2,

are real matrices, where satisfies . Then, is regular, but is not.

In order to examine the accuracy of the least-squares estimation, we leverage existing finite time identification results for unstable dynamics [21]. First, if the empirical covariance matrix is non-singular, one can write . Hence, the behavior of governs the estimation accuracy. For unstable , an appropriately normalized is shown to be a random matrix [21]. Thus, letting denote the normalized matrix, the accuracy of depends on the stochastic lower bounds of . Let be the high probability lower bound of ; i.e. it is sufficiently small to satisfy . The following statement studies based on anti-concentration results for sequences of random matrices [25].

###### Proposition 2.

[21] Suppose that is regular. In general, implies that . Further, if has a bounded pdf for some , then for all we have , where is a fixed constant.

Theorem 1 determines the time length the user should interact with the system, in order to collect sufficiently many observations for accurate identification of the unstable matrix . The sample size is based on the constant , for which the exact dependence on the noise parameters , , and the closed-loop matrix is available [21]. Moreover, let (respectively ) be the distinct eigenvalues of outside (respectively inside) the unit circle. Then, depends on , , , and [21]. The constant depends on the upper bound of the pdf of as well. The explicit specification of these dependencies is fully presented in [21] and hence ommitted. Next, let be large enough, such that implies

 n(logn)4/α≥ρϵ2((−logδ)1+4/α−logψ(δ)). (5)
###### Theorem 1 (Unstable identification [21]).

Suppose that is regular, and has no eigenvalue of unit size. As long as , we have

 P(∣∣∣∣∣∣^Dn−D∣∣∣∣∣∣2≤ϵ)≥1−δ.

Hence, by (5), the probability of having an identification error of magnitude , decays exponentially fast when grows. In the next section, we show that one can satisfy the assumptions of Theorem 1 by applying random linear feedbacks to a stabilizable system with unknown dynamics parameters.

## Iv Stabilization Algorithm

Although the true parameter is unknown, according to Lemma 1, a stabilizing linear feedback can be designed, if one can find a stabilizing neighborhood , such that

 (6)

Using Theorem 1, we establish that can be estimated if one applies a random linear feedback to the system. Since in Theorem 1 the closed-loop transition matrix needs to be regular with no eigenvalue of unit size, first we need to show that these conditions can be satisfied. Lemma 2, and Lemma 3 accomplish this, with no knowledge beyond stabilizability of . Based on the properties of the distribution of a random linear feedback matrix

, the above lemmas provide general statements, which hold almost surely. Then, we present a finite time stabilizing algorithm, and prove that it will provide us the desired stabilizing neighborhood. To proceed, we define the following classes of probability distributions over real valued vectors and matrices.

###### Definition 3 (Full rank distributions).

Let be a random vector in .

has a linearly full rank distribution if for any arbitrary hyperplane

, it holds that . Further, has a general full rank distribution, if for every manifold such that , it holds that .

The following example illustrates the difference between the two types of full rank distributions defined above.

###### Example 1.

Let

, with arbitrary mean , and positive definite covariance matrix . Then, has a general full rank distribution. Letting , the random vector has a linearly full rank distribution, but since it lives on the unit sphere, does not have a general full rank distribution.

Random linear feedbacks with full rank distributions induce the desired properties to the closed-loop transition matrix, as we rigorously establish below.

###### Lemma 2 (Closed-loop Regularity).

Assume is stabilizable. Let the columns of be independent (but not necessarily identically distributed), with linearly full rank distributions. The matrix is regular, with probability one.

###### Proof of Lemma 2.

Let the event be that is irregular. We prove that for all , , with probability one, . Note that according to the discussion after Definition 2, this implies .

First, let have linearly full rank distributions. Define , and let be a matrix, with all coordinates being real polynomials of . Let be a real polynomial of as well. We show that

 (7)

If , letting , two of the vectors , such as , can be written as linear combinations of the others. There are finitely many values of for which is a linear combination of , since for every such a , , where is the square matrix whose columns are , removing an arbitrary row. Note that is a polynomial of , divided by , and .

Note that is a deterministic function of . For every such , the dimension of the subspace spanned by is at most . Because is independent of , and has a linearly full rank distribution, ; i.e. (7) holds.

Now, let . If , applying the above argument to , we have , since full rankness of implies linearly full rank distributions for all columns of . If , there is a permutation matrix , and , such that , where is full rank. Let be a stabilizer, , and , to get . Writing , we have

 rank(A0+B0L−λIp) = = rank([D1+~B(L−L0)−λJ1[−K,Ip−m]J(D0−λIp)]).

Denote the last matrix above by . Since , for the matrix is full rank. Therefore, because of , we have .

Rearrange the columns of matrix to get , such that . In other words, linearly independent columns of have been put together to form . If is not regular,

 p−2 ≥ rank(~X)=rank(X) = rank(X[Im0m×(p−m)−X−122X21Ip−m]) = rank([X11−X12X−122X21X120(p−m)×mX22]).

Hence, . Recall that columns of are exactly the same as , and all coordinates of are polynomials of (since all coordinates of are polynomials of the coordinates of ). Taking , by (7), since full rankness of implies linearly full rank distributions for all columns of , we have , which is the desired result since . ∎

If the distribution of linear feedback is generally full rank, the following results shows that has no eigenvalue on the unit circle of the complex plane.

###### Lemma 3 (Closed-loop Eigenvalues).

Assume is stabilizable. Let have a general full rank distribution over . With probability one, has no unit size eigenvalue.

###### Proof of Lemma 3.

Assume has a unit-root eigenvalue, denoted by . Further, assume that , and let the permutation matrix and the matrix be such that

 JB0=[~BK~B]=[ImK]~B,

where is full rank. Letting be a stabilizer, , and , note that has a general full rank distribution, thanks to the full-rankness of . Since is stable, , and

 0 = det(A0+B0L−λIp) = det(JD0+[ImK]X−λJ) = det((D0−λIp)−1J−1[ImK]X+Ip) = det(X(D0−λIp)−1J−1[ImK]+Im),

where the last equality above is implied by Sylvester’s determinant identity. Denote the complex conjugate of by , and define the real matrix

 M(λ)=M(¯λ)=(D0−¯λIp)−1(D0−λIp)−1J−1[ImK].

Further, define the space of eigenvectors in

as follows. First, consider the relation on , defined as

 x∼y, if x=cy for some c∈C,c≠0.

Since is an equivalence relation, for the set of equivalence classes denoted by (which is the direction space in ) we have ; i.e. .

Note that for every matrix and every vector , if and only if for every . Thus, implies that there is , such that

 (X(D0−¯λIp)M(λ)+Im)v=0 (8)

Denote the set of all matrices satisfying (8) by . Separating the real () and imaginary () parts, we get , , where for , the vectors are defined as

 a(v) = M(λ)R(¯λv)−D0M(λ)R(v), b(v) = M(λ)I(¯λv)−D0M(λ)I(v).

Next, we partition to ; i.e. , where

 S1 = {v∈S:a(v),b(v) are in-line }, S2 = {v∈S:a(v),b(v) are not in-line }.

Whenever , for , the -th row of needs to be in the intersection of two nonparallel hyperplanes , where

 P1 = {y∈Rp:y′a(v)=R(vj)}, P2 = {y∈Rp:y′b(v)=I(vj)}.

Since , , and , we have . Therefore, for , we have . Since , using we have

 dimR(Z1)≤1+2m−2+m(p−2)=mp−1, (9)

where .

On the other hand, for , there is a real number, say , such that . Then,

 I(v)=Xb(v)=φ(v)Xa(v)=φ(v)R(v), (10)

i.e. whenever , the vectors are in-line. So, , and for , we have , i.e. . After doing some algebra, we obtain

 0 = φ(v)a(v)−b(v) = φ(v)(R(λ)Ip+φ(v)I(λ)Ip−D0)M(λ)R(v) − (φ(v)R(λ)Ip−I(λ)Ip−φ(v)D0)M(λ)R(v) = (1+φ(v)2)I(λ)M(λ)R(v),

i.e. either , or . According to the definition of , the latter case implies , which due to (10) leads to , and is impossible. So, by , we have

 dimR(Z2)≤m−1+m(p−1)=mp−1, (11)

where . Writing , according to (9), (11) we have , and by general full-rankness of the distribution of , the desired result holds: . ∎

Subsequently, an algorithmic procedure to find a stabilizing neighborhood will be presented based on random linear feedbacks discussed above. First, letting , draw the columns of from independent standard Gaussian distributions . Note that because of independence, for all , the random feedback has a general full rank distribution . Lemma 2 and Lemma 3 show that the conditions of Theorem 1 hold. Therefore, every closed-loop transition matrix can be estimated arbitrarily accurate. We show how to find a high probability confidence set for , using the accurate estimates of .

Letting be as Lemma 1, define the precision and the matrix containing all matrices by

 M = [Ip⋯IpL1⋯Lk]∈Rq×kp, (12) ~ϵ = (13)

As a matter of fact, since and are independent, we have , almost surely; i.e. . Further, if became too small once are drawn, one can repeatedly draw the random feedbacks to avoid pathologically small values of .

Then, let , and for define the following:

 τi = τi−1+N(~ϵ,δk), (14) ^D(i) = argminE∈Rp×pτi−1∑t=τi−1||x(t+1)−Ex(t)||22, (15) Ω(i) = (16)

where the sample size is given in (5). Conceptually, is the time point when the control action changes, is the least-squares estimate, and is a confidence set for . In fact, for each , Algorithm 1 applies the linear feedback during the time period . Then, observing , the algorithm uses to estimate the true closed-loop matrix . Finally, the high probability confidence set is constructed for the true parameter , according to . Iterating the above procedure for all , the algorithm constructs , and returns as an stabilizing set. Below, we show that it satisfies (6).

By Theorem 1, (14) implies that , with probability at least . So, by (16), we have ; i.e. . To show that is a stabilizing set, let be arbitrary. On the event , for all we have . Using the definition of in (12), the latter result leads to . Thus, (13) implies the following for :

or equivalently , which is the desired inequality of (6). Note that since , with probability at least , the failure probability of Algorithm 1 is at most . This completes the proof of the following result.

###### Theorem 2 (Stabilization).

Let be the stabilizing set provided by Algorithm 1. For arbitrary , we have

 P(∣∣λmax(θ0~L(θ))∣∣<1)≥1−δ.

In other words, the probability of failing to stabilize the system decays exponentially when the time of interaction with the system grows (see (5)). Obviously, the normal distribution used in Algorithm 1 is not unique, and can be substituted by any general full rank distribution over .

## V Conclusion

We studied an adaptive stabilization scheme for linear dynamical systems, focusing on finite time analysis. Tailoring a novel procedure based on random linear feedbacks, we established non-asymptotic results under mild assumptions, namely those of system stabilizability and a fairly general noise process that encompasses heavy-tailed distributions.

There are a number of interesting extensions of the current work. First, finite time analysis of stabilization given noisy observations of the state vector is an interesting topic for future investigation. Second, studying the stabilization problem in a high-dimensional setting (assuming sparsity or some other low dimensional structure) is also an interesting subject to be addressed in the future.

## Appendix A Proof of Proposition 1

###### Proof.

For convenience, let and . First, assume is stabilizable, is a stabilizer, , and . For arbitrary fixed PSD matrix , define recursively,

 − A′0Pt−1(P0)B0(B′0Pt−1(P0)B0+R)−1B′0Pt−1(P0)A0.

Letting be as defined in (2), the optimal control policy for minimizing the finite horizon cumulative cost , is