Thompson Sampling Achieves Õ(√(T)) Regret in Linear Quadratic Control

by   Taylan Kargin, et al.

Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that Õ(√(T)) frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains Õ(√(T)) regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018). TSAC does not require a priori known stabilizing controller and achieves fast stabilization of the underlying system by effectively exploring the environment in the early stages. Our result hinges on developing a novel lower bound on the probability that the TS provides an optimistic sample. By carefully prescribing an early exploration strategy and a policy update rule, we show that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs. We empirically demonstrate the performance and the efficiency of TSAC in several adaptive control tasks.


Explore More and Improve Regret in Linear Quadratic Regulators

Stabilizing the unknown dynamics of a control system and minimizing regr...

Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

We consider adaptive control of the Linear Quadratic Regulator (LQR), wh...

Safe Adaptive Learning-based Control for Constrained Linear Quadratic Regulators with Regret Guarantees

We study the adaptive control of an unknown linear system with a quadrat...

Regret Minimization in Partially Observable Linear Quadratic Control

We study the problem of regret minimization in partially observable line...

Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

We consider the problem of controlling a stochastic linear system with q...

Exact Asymptotics for Linear Quadratic Adaptive Control

Recent progress in reinforcement learning has led to remarkable performa...

Optimistic robust linear quadratic dual control

Recent work by Mania et al. has proved that certainty equivalent control...