Universal Policies for Software-Defined MDPs

12/21/2020
by   Daniel Selsam, et al.
0

We introduce a new programming paradigm called oracle-guided decision programming in which a program specifies a Markov Decision Process (MDP) and the language provides a universal policy. We prototype a new programming language, Dodona, that manifests this paradigm using a primitive 'choose' representing nondeterministic choice. The Dodona interpreter returns either a value or a choicepoint that includes a lossless encoding of all information necessary in principle to make an optimal decision. Meta-interpreters query Dodona's (neural) oracle on these choicepoints to get policy and value estimates, which they can use to perform heuristic search on the underlying MDP. We demonstrate Dodona's potential for zero-shot heuristic guidance by meta-learning over hundreds of synthetic tasks that simulate basic operations over lists, trees, Church datastructures, polynomials, first-order terms and higher-order terms.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/01/2019

Elaboration Tolerant Representation of Markov Decision Process via Decision-Theoretic Extension of Probabilistic Action Language pBC+

We extend probabilistic action language pBC+ with the notion of utility ...
01/16/2013

PEGASUS: A Policy Search Method for Large MDPs and POMDPs

We propose a new approach to the problem of searching a space of policie...
07/13/2018

On the Complexity of Value Iteration

Value iteration is a fundamental algorithm for solving Markov Decision P...
09/10/2020

A Markov Decision Process Approach to Active Meta Learning

In supervised learning, we fit a single statistical model to a given dat...
02/25/2021

Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Current work in explainable reinforcement learning generally produces po...
05/23/2019

Where to Find Next Passengers on E-hailing Platforms? - A Markov Decision Process Approach

Vacant taxi drivers' passenger seeking process in a road network generat...
08/09/2020

Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

This paper investigates the optimization problem of an infinite stage di...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.