Log In Sign Up

Improving the Exploration of Deep Reinforcement Learning in Continuous Domains using Planning for Policy Search

by   Jakob J. Hollenstein, et al.

Local policy search is performed by most Deep Reinforcement Learning (D-RL) methods, which increases the risk of getting trapped in a local minimum. Furthermore, the availability of a simulation model is not fully exploited in D-RL even in simulation-based training, which potentially decreases efficiency. To better exploit simulation models in policy search, we propose to integrate a kinodynamic planner in the exploration strategy and to learn a control policy in an offline fashion from the generated environment interactions. We call the resulting model-based reinforcement learning method PPS (Planning for Policy Search). We compare PPS with state-of-the-art D-RL methods in typical RL settings including underactuated systems. The comparison shows that PPS, guided by the kinodynamic planner, collects data from a wider region of the state space. This generates training data that helps PPS discover better policies.


page 1

page 2

page 3

page 4


DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

We propose a method for effective training of deep Reinforcement Learnin...

COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration

Data efficiency and robustness to task-irrelevant perturbations are long...

Improving width-based planning with compact policies

Optimal action selection in decision problems characterized by sparse, d...

Mixed Reinforcement Learning with Additive Stochastic Uncertainty

Reinforcement learning (RL) methods often rely on massive exploration da...

Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents

We investigate the task of learning to follow natural language instructi...

Model Based Planning with Energy Based Models

Model-based planning holds great promise for improving both sample effic...

Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network

Existing exploration strategies in reinforcement learning (RL) often eit...