Bayesian Policy Search for Stochastic Domains

10/01/2020
by   David Tolpin, et al.
0

AI planning can be cast as inference in probabilistic models, and probabilistic programming was shown to be capable of policy search in partially observable domains. Prior work introduces policy search through Markov chain Monte Carlo in deterministic domains, as well as adapts black-box variational inference to stochastic domains, however not in the strictly Bayesian sense. In this work, we cast policy search in stochastic domains as a Bayesian inference problem and provide a scheme for encoding such problems as nested probabilistic programs. We argue that probabilistic programs for policy search in stochastic domains should involve nested conditioning, and provide an adaption of Lightweight Metropolis-Hastings (LMH) for robust inference in such programs. We apply the proposed scheme to stochastic domains and show that policies of similar quality are learned, despite a simpler and more general inference algorithm. We believe that the proposed variant of LMH is novel and applicable to a wider class of probabilistic programs with nested conditioning.

READ FULL TEXT
research
10/01/2020

Probabilistic Programs with Stochastic Conditioning

We tackle the problem of conditioning probabilistic programs on distribu...
research
01/08/2020

Stochastic probabilistic programs

We introduce the notion of a stochastic probabilistic program and presen...
research
06/14/2016

Spreadsheet Probabilistic Programming

Spreadsheet workbook contents are simple programs. Because of this, prob...
research
11/14/2018

Composing Modeling and Inference Operations with Probabilistic Program Combinators

Probabilistic programs with dynamic computation graphs can define measur...
research
03/27/2013

Inference Policies

It is suggested that an AI inference system should reflect an inference ...
research
01/10/2021

Stabilized Nested Rollout Policy Adaptation

Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorith...
research
03/22/2020

Generalized Nested Rollout Policy Adaptation

Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorith...

Please sign up or login with your details

Forgot password? Click here to reset