Minimal Expected Regret in Linear Quadratic Control

09/29/2021
by   Yassir Jedra, et al.
8

We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices A and B may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time T is upper bounded (i) by O((d_u+d_x)√(d_xT)) when A and B are unknown, (ii) by O(d_x^2log(T)) if only A is unknown, and (iii) by O(d_x(d_u+d_x)log(T)) if only B is unknown and under some mild non-degeneracy condition (d_x and d_u denote the dimensions of the state and of the control input, respectively). These regret scalings are minimal in T, d_x and d_u as they match existing lower bounds in scenario (i) when d_x≤ d_u [SF20], and in scenario (ii) [lai1986]. We conjecture that our upper bounds are also optimal in scenario (iii) (there is no known lower bound in this setting). Existing online algorithms proceed in epochs of (typically exponentially) growing durations. The control policy is fixed within each epoch, which considerably simplifies the analysis of the estimation error on A and B and hence of the regret. Our algorithm departs from this design choice: it is a simple variant of certainty-equivalence regulators, where the estimates of A and B and the resulting control policy can be updated as frequently as we wish, possibly at every step. Quantifying the impact of such a constantly-varying control policy on the performance of these estimates and on the regret constitutes one of the technical challenges tackled in this paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2020

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

We consider the problem of learning in Linear Quadratic Control systems ...
research
01/27/2020

Naive Exploration is Optimal for Online LQR

We consider the problem of online adaptive control of the linear quadrat...
research
06/23/2021

Best-Case Lower Bounds in Online Learning

Much of the work in online learning focuses on the study of sublinear up...
research
02/21/2023

Regret Analysis of Online LQR Control via Trajectory Prediction and Tracking: Extended Version

In this paper, we propose and analyze a new method for online linear qua...
research
05/27/2022

Learning to Control Linear Systems can be Hard

In this paper, we study the statistical difficulty of learning to contro...
research
08/18/2021

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

We consider the problem of controlling an unknown linear quadratic Gauss...
research
06/19/2018

Online Linear Quadratic Control

We study the problem of controlling linear time-invariant systems with k...

Please sign up or login with your details

Forgot password? Click here to reset