DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

by   Mohammadhosein Hasanbeig, et al.

We propose a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires the attainment of an unknown sequence of high-level objectives. Our method employs a recently-published algorithm for synthesis of compact automata to uncover this sequential structure. We synthesise an automaton from trace data generated through exploration of the environment by the deep RL agent. A product construction is then used to enrich the state space of the environment so that generation of an optimal control policy by deep RL is guided by the discovered structure encoded in the automaton. Our experiments show that our method is able to achieve training results that are otherwise difficult with state-of-the-art RL techniques unaided by external guidance.


DeepSynth: Program Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

We propose a method for efficient training of deep Reinforcement Learnin...

Improving the Exploration of Deep Reinforcement Learning in Continuous Domains using Planning for Policy Search

Local policy search is performed by most Deep Reinforcement Learning (D-...

Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning

Designing optimal reward functions has been desired but extremely diffic...

Induction and Exploitation of Subgoal Automata for Reinforcement Learning

In this paper we present ISA, an approach for learning and exploiting su...

A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control

Deep reinforcement learning for high dimensional, hierarchical control t...

Deep reinforcement learning for guidewire navigation in coronary artery phantom

In percutaneous intervention for treatment of coronary plaques, guidewir...

PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning

In this work, we present a reinforcement learning (RL) based approach to...