Thompson Sampling Achieves Õ(√(T)) Regret in Linear Quadratic Control

06/17/2022
by   Taylan Kargin, et al.
31

Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that Õ(√(T)) frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains Õ(√(T)) regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018). TSAC does not require a priori known stabilizing controller and achieves fast stabilization of the underlying system by effectively exploring the environment in the early stages. Our result hinges on developing a novel lower bound on the probability that the TS provides an optimistic sample. By carefully prescribing an early exploration strategy and a policy update rule, we show that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs. We empirically demonstrate the performance and the efficiency of TSAC in several adaptive control tasks.

READ FULL TEXT
07/23/2020

Explore More and Improve Regret in Linear Quadratic Regulators

Stabilizing the unknown dynamics of a control system and minimizing regr...
05/23/2018

Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

We consider adaptive control of the Linear Quadratic Regulator (LQR), wh...
10/31/2021

Safe Adaptive Learning-based Control for Constrained Linear Quadratic Regulators with Regret Guarantees

We study the adaptive control of an unknown linear system with a quadrat...
01/31/2020

Regret Minimization in Partially Observable Linear Quadratic Control

We study the problem of regret minimization in partially observable line...
01/25/2022

Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

We consider the problem of controlling a stochastic linear system with q...
11/02/2020

Exact Asymptotics for Linear Quadratic Adaptive Control

Recent progress in reinforcement learning has led to remarkable performa...
12/31/2019

Optimistic robust linear quadratic dual control

Recent work by Mania et al. has proved that certainty equivalent control...