A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

08/19/2021
βˆ™
by   Mukul Gagrani, et al.
βˆ™
0
βˆ™

We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of π’ͺΜƒ(√(T)), where T is the time-horizon and the π’ͺΜƒ(Β·) notation hides logarithmic terms inΒ T.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 11/23/2020

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Reinforcement learning (RL) with linear function approximation has recei...
research
βˆ™ 02/17/2019

Learning Linear-Quadratic Regulators Efficiently with only √(T) Regret

We present the first computationally-efficient algorithm with O(√(T)) r...
research
βˆ™ 08/09/2016

Posterior Sampling for Reinforcement Learning Without Episodes

This is a brief technical note to clarify some of the issues with applyi...
research
βˆ™ 09/08/2021

Learning Zero-sum Stochastic Games with Posterior Sampling

In this paper, we propose Posterior Sampling Reinforcement Learning for ...
research
βˆ™ 08/18/2021

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

We consider the problem of controlling an unknown linear quadratic Gauss...
research
βˆ™ 02/07/2022

On learning Whittle index policy for restless bandits with scalable regret

Reinforcement learning is an attractive approach to learn good resource ...
research
βˆ™ 04/04/2019

Embeddings of k-complexes into 2k-manifolds

If K is a simplicial k-complex, the standard van Kampen obstructions tel...

Please sign up or login with your details

Forgot password? Click here to reset