Universal Policies for Software-Defined MDPs

12/21/2020
by   Daniel Selsam, et al.
0

We introduce a new programming paradigm called oracle-guided decision programming in which a program specifies a Markov Decision Process (MDP) and the language provides a universal policy. We prototype a new programming language, Dodona, that manifests this paradigm using a primitive 'choose' representing nondeterministic choice. The Dodona interpreter returns either a value or a choicepoint that includes a lossless encoding of all information necessary in principle to make an optimal decision. Meta-interpreters query Dodona's (neural) oracle on these choicepoints to get policy and value estimates, which they can use to perform heuristic search on the underlying MDP. We demonstrate Dodona's potential for zero-shot heuristic guidance by meta-learning over hundreds of synthetic tasks that simulate basic operations over lists, trees, Church datastructures, polynomials, first-order terms and higher-order terms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2019

Elaboration Tolerant Representation of Markov Decision Process via Decision-Theoretic Extension of Probabilistic Action Language pBC+

We extend probabilistic action language pBC+ with the notion of utility ...
research
01/16/2013

PEGASUS: A Policy Search Method for Large MDPs and POMDPs

We propose a new approach to the problem of searching a space of policie...
research
08/03/2022

Bayesian regularization of empirical MDPs

In most applications of model-based Markov decision processes, the param...
research
09/10/2020

A Markov Decision Process Approach to Active Meta Learning

In supervised learning, we fit a single statistical model to a given dat...
research
02/25/2021

Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Current work in explainable reinforcement learning generally produces po...
research
07/12/2012

Learning Diagnostic Policies from Examples by Systematic Search

A diagnostic policy specifies what test to perform next, based on the re...

Please sign up or login with your details

Forgot password? Click here to reset