DeepAI AI Chat
Log In Sign Up

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

by   Matthias Hoffman, et al.

In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.


page 1

page 2

page 3

page 4


Reversible Genetically Modified Mode Jumping MCMC

In this paper, we introduce a reversible version of a genetically modifi...

Modeling Dynamics with Deep Transition-Learning Networks

Markov processes, both classical and higher order, are often used to mod...

Peskun-Tierney ordering for Markov chain and process Monte Carlo: beyond the reversible scenario

Historically time-reversibility of the transitions or processes underpin...

Long-Time Convergence and Propagation of Chaos for Nonlinear MCMC

In this paper, we study the long-time convergence and uniform strong pro...

Factoring Exogenous State for Model-Free Monte Carlo

Policy analysts wish to visualize a range of policies for large simulato...

Bayesian learning of noisy Markov decision processes

We consider the inverse reinforcement learning problem, that is, the pro...

Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions

Chance Constrained Markov Decision Processes maximize reward subject to ...