Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

01/29/2022
by   Chao Qu, et al.
0

We consider a sequential decision making problem where the agent faces the environment characterized by the stochastic discrete events and seeks an optimal intervention policy such that its long-term reward is maximized. This problem exists ubiquitously in social media, finance and health informatics but is rarely investigated by the conventional research in reinforcement learning. To this end, we present a novel framework of the model-based reinforcement learning where the agent's actions and observations are asynchronous stochastic discrete events occurring in continuous-time. We model the dynamics of the environment by Hawkes process with external intervention control term and develop an algorithm to embed such process in the Bellman equation which guides the direction of the value gradient. We demonstrate the superiority of our method in both synthetic simulator and real-world problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2018

Deep Reinforcement Learning of Marked Temporal Point Processes

In a wide variety of applications, humans interact with a complex enviro...
research
11/10/2021

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Safety has become one of the main challenges of applying deep reinforcem...
research
08/12/2023

Value-Distributional Model-Based Reinforcement Learning

Quantifying uncertainty about a policy's long-term performance is import...
research
07/18/2023

Online Learning with Costly Features in Non-stationary Environments

Maximizing long-term rewards is the primary goal in sequential decision-...
research
07/25/2022

Meta Neural Ordinary Differential Equations For Adaptive Asynchronous Control

Model-based Reinforcement Learning and Control have demonstrated great p...
research
06/04/2022

Between Rate-Distortion Theory Value Equivalence in Model-Based Reinforcement Learning

The quintessential model-based reinforcement-learning agent iteratively ...
research
09/20/2018

Predicting Periodicity with Temporal Difference Learning

Temporal difference (TD) learning is an important approach in reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset