Improving the Exploration of Deep Reinforcement Learning in Continuous Domains using Planning for Policy Search

10/24/2020
by   Jakob J. Hollenstein, et al.
0

Local policy search is performed by most Deep Reinforcement Learning (D-RL) methods, which increases the risk of getting trapped in a local minimum. Furthermore, the availability of a simulation model is not fully exploited in D-RL even in simulation-based training, which potentially decreases efficiency. To better exploit simulation models in policy search, we propose to integrate a kinodynamic planner in the exploration strategy and to learn a control policy in an offline fashion from the generated environment interactions. We call the resulting model-based reinforcement learning method PPS (Planning for Policy Search). We compare PPS with state-of-the-art D-RL methods in typical RL settings including underactuated systems. The comparison shows that PPS, guided by the kinodynamic planner, collects data from a wider region of the state space. This generates training data that helps PPS discover better policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2019

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

We propose a method for effective training of deep Reinforcement Learnin...
research
05/22/2019

COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration

Data efficiency and robustness to task-irrelevant perturbations are long...
research
06/15/2018

Improving width-based planning with compact policies

Optimal action selection in decision problems characterized by sparse, d...
research
02/28/2020

Mixed Reinforcement Learning with Additive Stochastic Uncertainty

Reinforcement learning (RL) methods often rely on massive exploration da...
research
09/15/2019

Model Based Planning with Energy Based Models

Model-based planning holds great promise for improving both sample effic...
research
09/29/2020

Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network

Existing exploration strategies in reinforcement learning (RL) often eit...
research
06/16/2018

Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents

We investigate the task of learning to follow natural language instructi...

Please sign up or login with your details

Forgot password? Click here to reset