Thompson Sampling Achieves Õ(√(T)) Regret in Linear Quadratic Control

by   Taylan Kargin, et al.

Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that Õ(√(T)) frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains Õ(√(T)) regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018). TSAC does not require a priori known stabilizing controller and achieves fast stabilization of the underlying system by effectively exploring the environment in the early stages. Our result hinges on developing a novel lower bound on the probability that the TS provides an optimistic sample. By carefully prescribing an early exploration strategy and a policy update rule, we show that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs. We empirically demonstrate the performance and the efficiency of TSAC in several adaptive control tasks.


Explore More and Improve Regret in Linear Quadratic Regulators

Stabilizing the unknown dynamics of a control system and minimizing regr...

Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems

We study the problem of adaptive control in partially observable linear ...

Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

We consider adaptive control of the Linear Quadratic Regulator (LQR), wh...

Safe Adaptive Learning-based Control for Constrained Linear Quadratic Regulators with Regret Guarantees

We study the adaptive control of an unknown linear system with a quadrat...

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

We study the problem of adaptive control of a high dimensional linear qu...

Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

We consider the problem of controlling a stochastic linear system with q...

Exact Asymptotics for Linear Quadratic Adaptive Control

Recent progress in reinforcement learning has led to remarkable performa...

Please sign up or login with your details

Forgot password? Click here to reset