DeepAI AI Chat
Log In Sign Up

Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

09/30/2021
by   Clement Gehring, et al.
MIT
ibm
3

Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals on these heuristics, making learning easier. Correct application of this technique requires consolidating the discounted metric used in RL and the non-discounted metric used in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/01/2022

AI Planning Annotation for Sample Efficient Reinforcement Learning

AI planning and Reinforcement Learning (RL) both solve sequential decisi...
06/24/2022

Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

It has been a recent trend to leverage the power of supervised learning ...
12/11/2016

Reinforcement Learning With Temporal Logic Rewards

Reinforcement learning (RL) depends critically on the choice of reward f...
11/12/2019

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Learning and planning in partially-observable domains is one of the most...
04/08/2020

Adaptive Stress Testing without Domain Heuristics using Go-Explore

Recently, reinforcement learning (RL) has been used as a tool for findin...
01/24/2023

NeSIG: A Neuro-Symbolic Method for Learning to Generate Planning Problems

In the field of Automated Planning there is often the need for a set of ...
10/08/2020

Learning Intrinsic Symbolic Rewards in Reinforcement Learning

Learning effective policies for sparse objectives is a key challenge in ...