Robust Online Control with Model Misspecification

07/16/2021 ∙ by Xinyi Chen, et al. ∙ 0

We study online control of an unknown nonlinear dynamical system that is approximated by a time-invariant linear system with model misspecification. Our study focuses on robustness, which measures how much deviation from the assumed linear approximation can be tolerated while maintaining a bounded ℓ_2-gain compared to the optimal control in hindsight. Some models cannot be stabilized even with perfect knowledge of their coefficients: the robustness is limited by the minimal distance between the assumed dynamics and the set of unstabilizable dynamics. Therefore it is necessary to assume a lower bound on this distance. Under this assumption, and with full observation of the d dimensional state, we describe an efficient controller that attains Ω(1/√(d)) robustness together with an ℓ_2-gain whose dimension dependence is near optimal. We also give an inefficient algorithm that attains constant robustness independent of the dimension, with a finite but sub-optimal ℓ_2-gain.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The control of linear dynamical systems is well studied and understood. Classical algorithms such as LQR and LQG are known to be optimal for stochastic control, while robust control is optimal in the worst case, assuming quadratic costs. Recent advancements gave rise to efficient online control methods based on convex relaxation that can minimize regret in the presence of adversarial perturbations. However, the problem of efficient control for general nonlinear systems is intractable.

In this paper we revisit a natural and well studied approach for nonlinear control: that of linear dynamics with model misspecification. The deviation of the nonlinear dynamics from a linear system is captured by an adversarial disturbance term that can scale with the system state history. The amount of such deviation that can be tolerated while maintaining system stability is called the robustness of the system.

Our study is motivated by a long standing research direction. The field of adaptive control has addressed the problem of controlling a linear dynamical system with uncertain parameters, providing guarantees of asymptotic optimality of adaptive control algorithms. However, these algorithms were shown to lack robustness under model misspecification (e.g. Rohrs et al. (1982)).

In this paper, we show that a properly designed adaptive control algorithm can exhibit a significant degree of robustness to unmodeled dynamics, even though the associated closed loop gain grows rapidly111The gain has to grow rapidly regardless of robustness, as per the lower bounds of (Chen and Hazan, 2021). We explore the limits of robust control of a linear dynamical system with adversarial perturbation whose magnitude can depend on the state history. We show that it is indeed possible to achieve constant robustness which depends only on the system dimension, and independent of its other natural parameters.

The controller that achieves this performance is computationally efficient. It is based on recent system identification techniques from non-stochastic control whose main component is active large-magnitude deterministic exploration. This technique deviates from one of the classical approaches of using least squares for system estimation and solving for the optimal controller.

1.1 Our contributions

We consider the setting of a linear dynamical system with time-invariant dynamics, together with model misspecification, as illustrated in Fig. 1.

Figure 1: Diagram of the system, where represents model misspecification.

The system evolves according to the following rule,

where is the (unknown) linear approximation to the system , are the control, state and adversarial perturbation respectively. The perturbation represents the deviation of the nonlinear system from the nominal system , and it crucially satisfies the following assumption:

(1)

The parameter is a measure of the robustness of the system, and is the main object of study. The larger is, the more model misspecification can be tolerated in the system, and our goal is to study the limits of stabilizability of the system with robustness being as large as possible. The measure of stability we use is taken from classical control theory, and is called the -gain of a closed-loop system with control algorithm in the feedback loop,

(2)

where are concatenations of and , respectively. This notion is closely related to the competitive ratio of the control algorithm , as we show in App. B. With this notation, we can formally state our main question:

How large can be for the system to allow a control algorithm which yields a finite -gain?

Our study initiates an answer to this question from both lower and upper bound perspectives. Specifically, for any system with a non-degenerate control matrix ,

  • We give an efficient algorithm that is able to control the system with robustness , where is the system dimension, and independently of the other system parameters.

    In addition, we show that this algorithm achieves finite of , where is an upper bound on the spectral norm of the system. While the exponential dependence on the dimension may seem daunting, it is known to be necessary as per the lower bound of Chen and Hazan (2021), which is .

  • We give an (inefficient) control algorithm with a finite -gain and constant robustness , independent of the other system parameters.

We also consider the limits of finite and robust control. Clearly if the system is not stabilizable, then one cannot obtain any lower bound on the robustness. The distance of the system from being stabilizable is thus an upper bound on the robustness, and we provide a proof for completeness in App. A.

For our main results, we use an active explore-then-commit method, and we use a doubling strategy to handle unknown disturbance levels. We also study system identification using online least squares, and prove that it gives constant robustness and finite -gain bounds for one dimensional systems in App. C. We explain why this methodology is hard to generalize to higher dimensions, and motivate our use of the active exploration technique.

1.2 Related work

Adaptive Control.

The most relevant field to our work is adaptive control, see for example a survey by Tao (2014). This field has addressed the problem of controlling a liner dynamical system with uncertain parameters, providing, in the 70s, guarantees of asymptotic optimality of adaptive control algorithms. However, reports of lack of robustness of such algorithms to unmodeled dynamics (as in the Rohrs et al. (1982) example) have emerged. One can argue that this lack of robustness was due to poor noise rejection transient performance of such controllers, which can be measured in terms of induced norm (gain) of the overall system. The general task of designing adaptive controllers with finite closed loop gain was solved, in abstract, by Cusumano and Poolla (1988), but the gain bounds obtained there grow very fast with the size of parameter uncertainty, and are therefore only good to guarantee a negligible amount of robustness. It has been confirmed by Megretski and Rantzer (2002) that even in the case of one dimensional linear models, the minimal achievable gain grows very fast with the size of parameter uncertainty.

Nonlinear Control.

Recent research has studied provable guarantees in various complementary (but incomparable) models for nonlinear control. These include planning regret in nonlinear control (Agarwal et al., 2021), adaptive nonlinear control under linearly-parameterized uncertainty (Boffi et al., 2020), online model-based control with access to non-convex planning oracles (Kakade et al., 2020), control with nonlinear observation models (Mhammedi et al., 2020), system identification for nonlinear systems (Mania et al., 2020) and nonlinear model-predictive control with feedback controllers (Sinha et al., 2021).

Robustness and in Control

The achievability of finite s for systems with unknown level of disturbance has been studied in control theory. Cusumano and Poolla (1988) gives a claim on the level of disturbance needed for finite . Megretski and Rantzer (2002) gives a lower bound on the closed loop of adaptive controllers that achieve finite for all systems with bounded spectral norm. However, the systems studied in this paper do not contain any model misspecification.

Competitive Analysis for Control

Yu et al. (2020) gives a control algorithm with constant competitive ratio for the setting of delayed feedback and imperfect future disturbance predictions. Shi et al. (2021) proposes algorithms whose competitive ratios are dimension-free for the setting of optimization with memory, with connections to control under a known, input-disturbed system and adversarial disturbances.

System Identification for Linear Dynamical Systems.

For an LDS with stochastic perturbations, the least squares method can be used to identify the dynamics in the partially observable and fully observable settings (Oymak and Ozay, 2019; Simchowitz et al., 2018; Sarkar and Rakhlin, 2019; Faradonbeh et al., 2019). However, least squares can lead to inconsistent solutions under adversarial disturbances. The algorithms by Simchowitz et al. (2019) and Ghai et al. (2020) tolerate adversarial disturbances, but the guarantees only hold for stable or marginally stable systems. If the adversarial disturbances are bounded, Hazan et al. (2020) and Chen and Hazan (2021) give system identification algorithms for any unknown system, stable or not, with and without knowledge of a stabilizing controller, respectively.

2 Definitions and Preliminaries

Notation.

We use the notation to hide constant and logarithmic terms in the relevant parameters. We use

to denote the spectral norm for matrices, and the Euclidean norm for vectors. We use

to denote the concatenation of , and similar notations are used for , , . We denote an -net as , defined as:

Definition 1.

We define to be an -net of , the unit sphere with the euclidean metric, if for any , we have such that .

Goal.

Given access to a black box LDS as in Section 1.1 satisfying the assumptions below, and without the ability to restart the system, obtain the best possible -gain. First we make the assumption on the disturbances in Section 1.1 formal.

Assumption 1.

We treat the model misspecification component of the system, , as an adversarial disturbance sequence. They are arbitrary functions of past states such that for all :222Notice that can depend on the actual trajectory of states, and not only their magnitude. This is important to capture miss-specification of the dynamics.

The disturbance in the system is arbitrary, and let . Without loss of generality, let .

Further, we assume the system is bounded and the control matrix is invertible.

Assumption 2.

The magnitude of the dynamics are bounded by a known constant , where .

’s minimum singular value is also lower bounded as

, where .

and Competitive Ratio.

The competitive ratio of a controller is a concept that is closely related to , but is more widely studied in the machine learning community. Informally, for any sequence of cost functions, the competitive ratio is the ratio between the cost of a given controller and the cost of the optimal controller, which has access to the disturbances a priori. Importantly, the notion of competitive ratio is counterfactual: it allows for different state trajectories as a function of the control inputs. Under some assumptions that our algorithm satisfies, bounds can be converted to competitive ratio bounds (see App.B). We choose to present our results in terms of for simplicity.

3 Algorithm and Results

In this section we describe our algorithm. The main algorithm, Alg.1

, is run in epochs, each with a proposed upper bound

on the disturbance magnitude . A new epoch starts whenever the controller discovers that is not sufficiently large and increases the upper bound. The key to this doubling strategy is identifying when and how much the upper bound should increase.
The algorithm uses an exploration set for system identification, and then executes the stabilizing controller of the estimated system. If the upper bound indeed exceeds , the algorithm is guaranteed to find a stabilizing controller. The efficient version of the algorithm uses the standard basis vectors as the exploration set, but attains robustness depending on . The inefficient version of the algorithm achieves dimension-free robustness, but uses an -net for exploration, resulting in an exponential number of large controls for system estimation.
The theorems below present the main guarantees of our algorithm.

Theorem 1.

For , , there exists , such that Alg. 1 has .

Theorem 2.

For , , there exists , such that Alg. 1 has .

Remark 1.

We note that when is the standard basis, has a closed form. In particular, the unconstrained solution333With small modifications to analysis, the constrained optimization can be replaced by a failure check if as this would indicate our disturbance budget is too small. of Line 16 in Alg. 2 has , where . When is an -net, is a maximum of convex functions, and hence a convex function.

1:  Input: System upper bound , control matrix singular value lower bound , system identification parameter , threshold parameter , and exploration set .
2:  Set .
3:  while  do
4:     Observe .
5:     if   then
6:        Update .
7:        Call Alg 2 with parameters , obtain updated and budget .
8:     else
9:        Execute .
10:        
11:     end if
12:  end while
Algorithm 1 algorithm
1:  Input: disturbance budget , system upper bound , control matrix singular value lower bound , system identification parameter , threshold parameter , and exploration set .
2:  Define with .
3:  Call Alg. 3 with parameters , obtain estimator and updated budget . Suppose the system evolves to time .
4:  Set .
5:  for i = 0, 1, …,  do
6:     observe .
7:     if   then
8:        Restart SysID from Line 2 with .
9:     end if
10:     if  is even then
11:        play , .
12:     else
13:        Play .
14:     end if
15:  end for
16:  Observe , compute
17:  Return
Algorithm 2 Adversarial System ID on Budget
1:  Input: disturbance budget , system upper bound , control matrix singular value lower bound , system identification parameter , threshold parameter .
2:  for i = 0, 1, …,  do
3:     observe .
4:     if   then
5:        Restart SysID with .
6:     end if
7:     play , .
8:  end for
9:  observe , compute
10:  if  or  then
11:     Restart SysID with .
12:  end if
13:  Return
Algorithm 3 Adversarial Control Matrix ID on Budget

3.1 Proof sketch

The algorithm has three components: exploration to estimate , exploration to estimate , and controlling the system with linear controller . We first sketch out the analysis if the upper bound on the disturbance magnitude is correct and . In this case, the algorithm will not start a new epoch and we are guaranteed to obtain a stabilizing controller. Note that in both exploration stages, the state can grow exponentially, so exploratory controls must also grow to keep up.

Identifying (see App. d.2).

Alg. 3 works by probing the system with scaled standard basis vectors. With sufficiently large scaling, . This allows us to estimate one column at a time.

Identifying (see App. d.3).

Once we have an accurate estimate , identification of in Alg. 2 works by applying controls every other iteration, where and is a large constant such that . One more time evolution with zero control gives . By Assumption 1, . As a result, we have . By definition of in Line 16, we also have , so . Exploratory controls are precondtioned with to acheive robustness independent of .

Exploration on the standard basis and on an -net (see Lem. 14 and Lem. 15).

If we explore with the standard basis, then we assure that each row of is accurate to , so . Because we use a Frobenius norm analysis, we only produce an accurate estimate of for . Exploration using an -net guarantees in all directions, providing an accurate estimate for .

Stabilizing the system (see Lem. 13).

Once exploration is complete, the system is stabilized by linear controller . By controlling the accuracy of and , we guarantee the closed loop system satisfies . We can obtain an end-to-end bound by bounding in terms of and using our exploration analysis.

Handling changing disturbance budget (see App. d.7).

We now sketch out the extension to unknown disturbance magnitude. In Alg 1, is the proposed upper bound on . There are a variety of conditions for failure in the algorithms (i.e. where we have proof that was not a valid upper bound) which trigger re-exploration and the start of a new epoch. If is indeed an upper bound, the above steps all will work without triggering a failure and we have for some constant . On the other hand, when a failure is detected, it is proof that . We can relate the penultimate budget to the final budget by bounding the state growth from a single time evolution where budget is exceeded. Combining the upper bound of and lower bound on produces an -gain bound.

4 Conclusions

We have shown, contrary to common wisdom in control theory, that it is possible to control a misspecified LDS with robustness that is independent of the system magnitude. In addition, our control algorithm has near-optimal dimension dependence in terms of . The most immediate open question is whether an efficient algorithm can be derived to obtain constant robustness, independent of the dimension, and with a tighter bound on in terms of the system magnitude. Other future directions include systems with partial observability and degenerate control matrices. It is also interesting to explore whether the same result can be obtained when the system inputs, not only the states, are subject to noise and misspecification.

References

  • N. Agarwal, E. Hazan, A. Majumdar, and K. Singh (2021) A regret minimization approach to iterative learning control. arXiv preprint arXiv:2102.13478. Cited by: §1.2.
  • N. M. Boffi, S. Tu, and J. E. Slotine (2020) Regret bounds for adaptive nonlinear control. External Links: 2011.13101 Cited by: §1.2.
  • X. Chen and E. Hazan (2021) Black-box control for linear dynamical systems. External Links: 2007.06650 Cited by: 1st item, §1.2, footnote 1.
  • S. J. Cusumano and K. Poolla (1988) Adaptive control of uncertain systems: a new approach. In Proceedings of the American Automatic Control Conference, pp. 355–359. Cited by: §1.2.
  • S.J. Cusumano and K. Poolla (1988) Nonlinear feedback vs. linear feedback for robust stabilization. In Proceedings of the 27th IEEE Conference on Decision and Control, Vol. , pp. 1776–1780 vol.3. External Links: Document Cited by: §1.2.
  • M. K. S. Faradonbeh, A. Tewari, and G. Michailidis (2019) Finite-time adaptive stabilization of linear systems. IEEE Transactions on Automatic Control 64 (8), pp. 3498–3505. Cited by: §1.2.
  • U. Ghai, H. Lee, K. Singh, C. Zhang, and Y. Zhang (2020) No-regret prediction in marginally stable systems. In Proceedings of Thirty Third Conference on Learning Theory, J. Abernethy and S. Agarwal (Eds.), Proceedings of Machine Learning Research, Vol. 125, , pp. 1714–1757. Cited by: §1.2.
  • E. Hazan, S. Kakade, and K. Singh (2020) The nonstochastic control problem. In Algorithmic Learning Theory, pp. 408–421. Cited by: §1.2.
  • S. Kakade, A. Krishnamurthy, K. Lowrey, M. Ohnishi, and W. Sun (2020) Information theoretic regret bounds for online nonlinear control. External Links: 2006.12466 Cited by: §1.2.
  • H. Mania, M. I. Jordan, and B. Recht (2020) Active learning for nonlinear system identification with guarantees. External Links: 2006.10277 Cited by: §1.2.
  • A. Megretski and A. Rantzer (2002) Lower and upper bounds for optimal l2 gain nonlinear robust control of first order linear system. Technical report Technical Report No. 41, Institut Mittag-Leffler. Cited by: §1.2, §1.2.
  • Z. Mhammedi, D. J. Foster, M. Simchowitz, D. Misra, W. Sun, A. Krishnamurthy, A. Rakhlin, and J. Langford (2020) Learning the linear quadratic regulator from nonlinear observations. arXiv preprint arXiv:2010.03799. Cited by: §1.2.
  • S. Oymak and N. Ozay (2019) Non-asymptotic identification of lti systems from a single trajectory. In 2019 American Control Conference (ACC), Vol. , pp. 5655–5661. Cited by: §1.2.
  • C. E. Rohrs, L. Valavani, M. Athans, and G. Stein (1982) Robustness of adaptive control algorithms in the presence of unmodeled dynamics. Massachusetts Institute of Technology. External Links: LIDS-P-1240 Cited by: §1.2, §1.
  • T. Sarkar and A. Rakhlin (2019) Near optimal finite time identification of arbitrary linear dynamical systems. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 5610–5618. Cited by: §1.2.
  • G. Shi, Y. Lin, S. Chung, Y. Yue, and A. Wierman (2021) Online optimization with memory and competitive control. External Links: 2002.05318 Cited by: §1.2.
  • M. Simchowitz, R. Boczar, and B. Recht (2019) Learning linear dynamical systems with semi-parametric least squares. In Proceedings of the Thirty-Second Conference on Learning Theory, A. Beygelzimer and D. Hsu (Eds.), Proceedings of Machine Learning Research, Vol. 99, Phoenix, USA, pp. 2714–2802. Cited by: §1.2.
  • M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, and B. Recht (2018) Learning without mixing: towards a sharp analysis of linear system identification. In Proceedings of the 31st Conference On Learning Theory, S. Bubeck, V. Perchet, and P. Rigollet (Eds.), Proceedings of Machine Learning Research, Vol. 75, , pp. 439–473. Cited by: §1.2.
  • R. Sinha, J. Harrison, S. M. Richards, and M. Pavone (2021) Adaptive robust model predictive control with matched and unmatched uncertainty. External Links: 2104.08261 Cited by: §1.2.
  • G. Tao (2014) Multivariable adaptive control: a survey. Automatica 50, pp. 2737–2764. Cited by: §1.2.
  • R. Vershynin (2011)

    Introduction to the non-asymptotic analysis of random matrices

    .
    External Links: 1011.3027 Cited by: §D.6, §D.6.
  • C. Yu, G. Shi, S. Chung, Y. Yue, and A. Wierman (2020) Competitive control with delayed imperfect information. External Links: 2010.11637 Cited by: §1.2.

Appendix A Limits on robustness in online control

In this subsection we give a simple example exhibiting the limitation of robustness, and in particular showing that in the case of an unstabilizable system, it is impossible to obtain constant robustness.

Definition 2 (Strong Controllability).

Given a linear time-invariant dynamical system , let denote

Then is strongly controllable if has full row-rank, and .

Lemma 3.

In general, a system with strong controllability cannot be controlled with robustness larger than .

Proof.

Consider the two dimensional system given by the matrices

The Kalman matrix for this system is given by

For , this matrix is full rank, and the system is strongly controllable with parameters . However, for , it can be seen that the system becomes uncontrollable even without any noise, since the first coordinate has no control which can cancel it, i.e.

For adversarial noise with robustness of , we can convert the system to , rendering it uncontrollable. The noise sequence will simply be

This happens with parameter which is . ∎

Appendix B Relating competitive ratio to gain

Here we relate the to the competitive ratio. We begin with a formal definition.

Definition 3.

(Competitive Ratio) Consider a sequence of cost functions . Let denote the cost of controller given the disturbance sequence , and let denote the cost of the offline optimal controller with full knowledge of . Both costs are worst case under any model misspecification that satisfies (1) subject to a fixed . The competitive ratio of a control algorithm , for satisfying Assumption 1 is defined as:

The bounds the ratio between and , while under the time-invariant cost function , the competitive ratio bounds the ratio of to . Here we show that , treating and as constants. Assuming is bounded by a constant multiple of , then .

Theorem 4.

Under the time-invariant cost function , for any system satisfying Assumptions 1 and 2, with ,

Proof.

We start by bounding using .

Summing over , we have

The lower bound follows after applying .

For the upper bound, consider , which produces closed loop dynamics and hence . Summing over , we have

Noting that , we have  .

Noting that , we have

Remark 2.

Dependence on is required in Theorem 4. Consider the system with , for all and alternates between and . As a result, oscillates between and for an average cost of , while is on average .

Appendix C Online Linear Regression

In this section, we provide an algorithm with bounded for any disturbance sequence that satisfies Assumption 1 for , for the 1-d system

c.1 Algorithm and Analysis

1:  Input: time horizon , system upper bound parameter .
2:  Initialize
3:  for  do
4:     Observe and define .
5:     Compute
6:     Compute
7:     Execute .
8:  end for
Algorithm 4 Online Least Squares Control

Proof Sketch

The key idea to the analysis is that if the algorithm estimates inaccurately, strong convexity of the one dimensional least squares objective implies that the magnitude of the disturbances is a nontrivial fraction of the magnitude of the states up to that point (see (3)). On the other hand, if is an accurate estimate of , we can bound using the stability of the closed loop dynamics. The result follows from stitching these regimes together. While we would like to extend this analysis to high dimensions, we note that (3) does not have a natural high dimensional extension. In particular, can be large in a direction where disturbances are small relative to the magnitude of the state.

Theorem 5.

Given Assumptions 1 and 2, for , Algorithm 4 has bounded by .

Proof.

Suppose , then . By definition, is the unconstrained minimizer of , where . Furthermore, since , we have

(3)

Now suppose , is the first time such that the dynamics are stable for the remainder of the time horizon. If or if , then . Using Assumption 1, we have

Applying (3), we have

Beyond , we can bound the states using stability of the dynamics, but we first need to bound the cost from to . Note, if , we do not need to use (3) and is appropriately bounded by unrolling dynamics via Lem. 6. Applying Lem. 6, we have

We complete our bound using Lem. 7, yielding

Lemma 6.

If Assumptions 1 and 2 hold, then for any sequence of ’s and , produced by Algorithm 4 satisfies

Proof.

We first note that so unrolling the dynamics once we have

Applying Assumption 1 with we have

Combining, we have

Unrolling, one more time, yields

Lemma 7.

Suppose for all , Alg. 4 produces such that . Then for any sequence of ’s and :

where if for time horizon .

Proof.

For , we will first prove by induction. For the base case, we have and . Now note that

Applying the inductive hypothesis, we have