Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

by   Yuwei Luo, et al.

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics {A_t, B_t}. The sequence of dynamics matrices can be arbitrary, but with a total variation, V_T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of 𝒊Ėƒ(V_T^2/5T^3/5). With piece-wise constant dynamics, our algorithm achieves the optimal regret of 𝒊Ėƒ(√(ST)) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of V_T. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.


page 1

page 2

page 3

page 4

∙ 06/18/2022

Optimal Dynamic Regret in LQR Control

We consider the problem of nonstochastic control with a sequence of quad...
∙ 02/03/2019

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

We propose the first contextual bandit algorithm that is parameter-free,...
∙ 02/26/2017

Kiefer Wolfowitz Algorithm is Asymptotically Optimal for a Class of Non-Stationary Bandit Problems

We consider the problem of designing an allocation rule or an "online le...
∙ 10/11/2022

On Adaptivity in Non-stationary Stochastic Optimization With Bandit Feedback

In this paper we study the non-stationary stochastic optimization questi...
∙ 03/04/2023

MNL-Bandit in non-stationary environments

In this paper, we study the MNL-Bandit problem in a non-stationary envir...
∙ 02/21/2023

Regret Analysis of Online LQR Control via Trajectory Prediction and Tracking: Extended Version

In this paper, we propose and analyze a new method for online linear qua...
∙ 02/10/2020

Combinatorial Semi-Bandit in the Non-Stationary Environment

In this paper, we investigate the non-stationary combinatorial semi-band...

Please sign up or login with your details

Forgot password? Click here to reset