Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

01/25/2022
by   Akshay Mete, et al.
0

We consider the problem of controlling a stochastic linear system with quadratic costs, when its system parameters are not known to the agent – called the adaptive LQG control problem. We re-examine an approach called "Reward-Biased Maximum Likelihood Estimate" (RBMLE) that was proposed more than forty years ago, and which predates the "Upper Confidence Bound" (UCB) method as well as the definition of "regret". It simply added a term favoring parameters with larger rewards to the estimation criterion. We propose an augmented approach that combines the penalty of the RBMLE method with the constraint of the UCB method, uniting the two approaches to optimization in the face of uncertainty. We first establish that theoretically this method retains 𝒪(√(T)) regret, the best known so far. We show through a comprehensive simulation study that this augmented RBMLE method considerably outperforms the UCB and Thompson sampling approaches, with a regret that is typically less than 50% of the better of their regrets. The simulation study includes all examples from earlier papers as well as a large collection of randomly generated systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2020

Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems

We study the problem of adaptive control in partially observable linear ...
research
11/16/2020

Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

The principle of Reward-Biased Maximum Likelihood Estimate Based Adaptiv...
research
05/23/2018

Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

We consider adaptive control of the Linear Quadratic Regulator (LQR), wh...
research
01/13/2023

Almost Surely √(T) Regret Bound for Adaptive LQR

The Linear-Quadratic Regulation (LQR) problem with unknown system parame...
research
06/17/2022

Thompson Sampling Achieves Õ(√(T)) Regret in Linear Quadratic Control

Thompson Sampling (TS) is an efficient method for decision-making under ...
research
06/08/2020

Learning the Truth From Only One Side of the Story

Learning under one-sided feedback (i.e., where examples arrive in an onl...

Please sign up or login with your details

Forgot password? Click here to reset