PG3: Policy-Guided Planning for Generalized Policy Generation

04/21/2022
by   Ryan Yang, et al.
6

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generation (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines. Code: https://github.com/ryangpeixu/pg3

READ FULL TEXT
research
02/26/2020

Policy Evaluation Networks

Many reinforcement learning algorithms use value functions to guide the ...
research
02/18/2020

Generalized Neural Policies for Relational MDPs

A Relational Markov Decision Process (RMDP) is a first-order representat...
research
12/15/2020

General Policies, Serializations, and Planning Width

It has been observed that in many of the benchmark planning domains, ato...
research
08/24/2017

Learning Generalized Reactive Policies using Deep Neural Networks

We consider the problem of learning for planning, where knowledge acquir...
research
08/04/2023

Synthesizing Programmatic Policies with Actor-Critic Algorithms and ReLU Networks

Programmatically Interpretable Reinforcement Learning (PIRL) encodes pol...
research
04/21/2021

Exploiting Learned Policies in Focal Search

Recent machine-learning approaches to deterministic search and domain-in...
research
11/27/2019

Learning Neural Search Policies for Classical Planning

Heuristic forward search is currently the dominant paradigm in classical...

Please sign up or login with your details

Forgot password? Click here to reset