Model-free Reinforcement Learning for Branching Markov Decision Processes

06/12/2021
by   Ernst Moritz Hahn, et al.
0

We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2019

From Reinforcement Learning to Optimal Control: A unified framework for sequential decisions

There are over 15 distinct communities that work in the general area of ...
research
09/26/2018

Omega-Regular Objectives in Model-Free Reinforcement Learning

We provide the first solution for model-free reinforcement learning of ω...
research
11/20/2017

Is prioritized sweeping the better episodic control?

Episodic control has been proposed as a third approach to reinforcement ...
research
10/21/2022

Deep Reinforcement Learning for Stabilization of Large-scale Probabilistic Boolean Networks

The ability to direct a Probabilistic Boolean Network (PBN) to a desired...
research
04/19/2023

Integrated Ray-Tracing and Coverage Planning Control using Reinforcement Learning

In this work we propose a coverage planning control approach which allow...
research
11/05/2020

Mixed Nondeterministic-Probabilistic Interfaces

Interface theories are powerful frameworks supporting incremental and co...
research
12/29/2017

Characterizing optimal hierarchical policy inference on graphs via non-equilibrium thermodynamics

Hierarchies are of fundamental interest in both stochastic optimal contr...

Please sign up or login with your details

Forgot password? Click here to reset