Logarithmic regret bounds for continuous-time average-reward Markov decision processes

05/23/2022
by   Xuefeng Gao, et al.
0

We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret lower bounds that are logarithmic in the time horizon. Moreover, we design a learning algorithm and establish a finite-time regret bound that achieves the logarithmic growth rate. Our analysis builds upon upper confidence reinforcement learning, a delicate estimation of the mean holding times, and stochastic comparison of point processes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2022

Square-root regret bounds for continuous-time episodic Markov decision processes

We study reinforcement learning for continuous-time Markov decision proc...
research
06/03/2018

Exploration in Structured Reinforcement Learning

We address reinforcement learning problems with finite state and action ...
research
06/29/2020

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

We present two elegant solutions for modeling continuous-time dynamics, ...
research
06/30/2020

Continuous-Time Multi-Armed Bandits with Controlled Restarts

Time-constrained decision processes have been ubiquitous in many fundame...
research
01/18/2022

A sojourn-based approach to semi-Markov Reinforcement Learning

In this paper we introduce a new approach to discrete-time semi-Markov d...
research
03/16/2023

Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP

Continuous-time Markov decision processes (CTMDPs) are canonical models ...
research
05/21/2018

Online Learning in Kernelized Markov Decision Processes

We consider online learning for minimizing regret in unknown, episodic M...

Please sign up or login with your details

Forgot password? Click here to reset