Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

This work studies theoretical performance guarantees of a ubiquitous reinforcement learning policy for controlling the canonical model of stochastic linear-quadratic system. We show that randomized certainty equivalent policy addresses the exploration-exploitation dilemma for minimizing quadratic costs in linear dynamical systems that evolve according to stochastic differential equations. More precisely, we establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parameters is shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamental challenges of continuous-time reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2021

Adaptive Control of Quadratic Costs in Linear Stochastic Differential Equations

We study a canonical problem in adaptive control; design and analysis of...
research
03/14/2019

On Applications of Bootstrap in Continuous Space Reinforcement Learning

In decision making problems for continuous state and action spaces, line...
research
06/20/2022

Thompson Sampling Efficiently Learns to Control Diffusion Processes

Diffusion processes that evolve according to linear stochastic different...
research
11/10/2018

Input Perturbations for Adaptive Regulation and Learning

Design of adaptive algorithms for simultaneous regulation and estimation...
research
11/14/2022

Implications of Regret on Stability of Linear Dynamical Systems

The setting of an agent making decisions under uncertainty and under dyn...
research
06/01/2022

Continuous Prediction with Experts' Advice

Prediction with experts' advice is one of the most fundamental problems ...
research
02/10/2021

Task-Optimal Exploration in Linear Dynamical Systems

Exploration in unknown environments is a fundamental problem in reinforc...

Please sign up or login with your details

Forgot password? Click here to reset