Near-continuous time Reinforcement Learning for continuous state-action spaces

09/06/2023
by   Lorenzo Croissant, et al.
0

We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems in which interactions occur at a high frequency, if not in continuous time, and whose state spaces are large if not inherently continuous. Perhaps the only exception is the Linear Quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency ε^-1, which captures arbitrary time scales: from discrete (ε=1) to continuous time (ε↓0). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on ℝ^d. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order 𝒪̃(ε^1/2 T+√(T)). As the frequency of interactions blows up, the approximation error ε^1/2 T vanishes, showing that 𝒪̃(√(T)) is attainable in near-continuous time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2018

Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces

Motivated by the success of reinforcement learning (RL) for discrete-tim...
research
02/09/2021

Continuous-Time Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) approaches rely on discrete-ti...
research
10/02/2020

POMDPs in Continuous Time and Discrete Spaces

Many processes, such as discrete event systems in engineering or populat...
research
09/02/2019

Evolutionary reinforcement learning of dynamical large deviations

We show how to calculate dynamical large deviations using evolutionary r...
research
05/01/2023

Collision Detection for Modular Robots – it is easy to cause collisions and hard to avoid them

We consider geometric collision-detection problems for modular reconfigu...
research
05/17/2023

Functional Connectivity: Continuous-Time Latent Factor Models for Neural Spike Trains

Modelling the dynamics of interactions in a neuronal ensemble is an impo...
research
02/18/2018

Estimating scale-invariant future in continuous time

Natural learners must compute an estimate of future outcomes that follow...

Please sign up or login with your details

Forgot password? Click here to reset