Square-root regret bounds for continuous-time episodic Markov decision processes

10/03/2022
by   Xuefeng Gao, et al.
0

We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. We present a learning algorithm based on the methods of value iteration and upper confidence bound. We derive an upper bound on the worst-case expected regret for the proposed algorithm, and establish a worst-case lower bound, both bounds are of the order of square-root on the number of episodes. Finally, we conduct simulation experiments to illustrate the performance of our algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2022

Logarithmic regret bounds for continuous-time average-reward Markov decision processes

We consider reinforcement learning for continuous-time Markov decision p...
research
01/30/2023

Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents

The optimized certainty equivalent (OCE) is a family of risk measures th...
research
06/26/2019

A Tractable Algorithm For Finite-Horizon Continuous Reinforcement Learning

We consider the finite horizon continuous reinforcement learning problem...
research
06/07/2022

Concentration bounds for SSP Q-learning for average cost MDPs

We derive a concentration bound for a Q-learning algorithm for average c...
research
02/18/2023

Best of Both Worlds Policy Optimization

Policy optimization methods are popular reinforcement learning algorithm...
research
11/15/2017

Markov Decision Processes with Continuous Side Information

We consider a reinforcement learning (RL) setting in which the agent int...
research
01/01/2019

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

Strong worst-case performance bounds for episodic reinforcement learning...

Please sign up or login with your details

Forgot password? Click here to reset