Inductive Policy Selection for First-Order MDPs

12/12/2012
by   Sung Wook Yoon, et al.
0

We select policies for large Markov Decision Processes (MDPs) with compact first-order representations. We find policies that generalize well as the number of objects in the domain grows, potentially without bound. Existing dynamic-programming approaches based on flat, propositional, or first-order representations either are impractical here or do not naturally scale as the number of objects grows without bound. We implement and evaluate an alternative approach that induces first-order policies using training data constructed by solving small problem instances using PGraphplan (Blum & Langford, 1999). Our policies are represented as ensembles of decision lists, using a taxonomic concept language. This approach extends the work of Martin and Geffner (2000) to stochastic domains, ensemble learning, and a wider variety of problems. Empirically, we find "good" policies for several stochastic first-order MDPs that are beyond the scope of previous approaches. We also discuss the application of this work to the relational reinforcement-learning problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

page 7

page 8

page 9

research
09/09/2011

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

We study an approach to policy selection for large relational Markov Dec...
research
06/26/2013

Scaling Up Robust MDPs by Reinforcement Learning

We consider large-scale Markov decision processes (MDPs) with parameter ...
research
07/11/2012

Exploiting First-Order Regression in Inductive Policy Selection

We consider the problem of computing optimal generalised policies for re...
research
01/30/2013

Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems

This paper presents two new approaches to decomposing and solving large ...
research
02/08/2023

Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration

To generalize across tasks, an agent should acquire knowledge from past ...
research
12/12/2018

Transition Tensor Markov Decision Processes: Analyzing Shot Policies in Professional Basketball

In this paper we model basketball plays as episodes from team-specific n...
research
06/30/2020

Verification of indefinite-horizon POMDPs

The verification problem in MDPs asks whether, for any policy resolving ...

Please sign up or login with your details

Forgot password? Click here to reset