Compositional Reinforcement Learning from Logical Specifications

by   Kishor Jothimurugan, et al.

We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning. First, DiRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines.


page 1

page 2

page 3

page 4


A Composable Specification Language for Reinforcement Learning Tasks

Reinforcement learning is a promising approach for learning control poli...

Abstract Value Iteration for Hierarchical Reinforcement Learning

We propose a novel hierarchical reinforcement learning framework for con...

From semantics to execution: Integrating action planning with reinforcement learning for robotic tool use

Reinforcement learning is an appropriate and successful method to robust...

Verifiable and Compositional Reinforcement Learning Systems

We propose a novel framework for verifiable and compositional reinforcem...

Learning Compositional Neural Programs for Continuous Control

We propose a novel solution to challenging sparse-reward, continuous con...

Disentangled Planning and Control in Vision Based Robotics via Reward Machines

In this work we augment a Deep Q-Learning agent with a Reward Machine (D...

Driving Style Encoder: Situational Reward Adaptation for General-Purpose Planning in Automated Driving

General-purpose planning algorithms for automated driving combine missio...