Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

03/24/2013
by   Morteza Ibrahimi, et al.
0

We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of O(√(T)) was shown, apart form logarithmic factors. However, this bound scales exponentially with p, the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present an adaptive control scheme that achieves a regret bound of O(p √(T)), apart from logarithmic factors. In particular, our algorithm has an average cost of (1+) times the optimum cost after T = (p) O(1/^2). This is in comparison to previous work on the dense dynamics where the algorithm requires time that scales exponentially with dimension in order to achieve regret of times the optimal cost. We believe that our result has prominent applications in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2020

Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems

We study the problem of adaptive control in partially observable linear ...
research
09/11/2019

Logarithmic Regret for Online Control

We study optimal regret bounds for control in linear dynamical systems u...
research
02/03/2023

Pseudonorm Approachability and Applications to Regret Minimization

Blackwell's celebrated approachability theory provides a general framewo...
research
03/19/2021

Towards a Dimension-Free Understanding of Adaptive Linear Control

We study the problem of adaptive control of the linear quadratic regulat...
research
07/23/2020

Explore More and Improve Regret in Linear Quadratic Regulators

Stabilizing the unknown dynamics of a control system and minimizing regr...
research
06/17/2022

Thompson Sampling Achieves Õ(√(T)) Regret in Linear Quadratic Control

Thompson Sampling (TS) is an efficient method for decision-making under ...
research
02/21/2023

Regret Analysis of Online LQR Control via Trajectory Prediction and Tracking: Extended Version

In this paper, we propose and analyze a new method for online linear qua...

Please sign up or login with your details

Forgot password? Click here to reset