DeepAI
Log In Sign Up

Improving the Exploration of Deep Reinforcement Learning in Continuous Domains using Planning for Policy Search

10/24/2020
by   Jakob J. Hollenstein, et al.
0

Local policy search is performed by most Deep Reinforcement Learning (D-RL) methods, which increases the risk of getting trapped in a local minimum. Furthermore, the availability of a simulation model is not fully exploited in D-RL even in simulation-based training, which potentially decreases efficiency. To better exploit simulation models in policy search, we propose to integrate a kinodynamic planner in the exploration strategy and to learn a control policy in an offline fashion from the generated environment interactions. We call the resulting model-based reinforcement learning method PPS (Planning for Policy Search). We compare PPS with state-of-the-art D-RL methods in typical RL settings including underactuated systems. The comparison shows that PPS, guided by the kinodynamic planner, collects data from a wider region of the state space. This generates training data that helps PPS discover better policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/22/2019

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

We propose a method for effective training of deep Reinforcement Learnin...
05/22/2019

COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration

Data efficiency and robustness to task-irrelevant perturbations are long...
06/15/2018

Improving width-based planning with compact policies

Optimal action selection in decision problems characterized by sparse, d...
02/28/2020

Mixed Reinforcement Learning with Additive Stochastic Uncertainty

Reinforcement learning (RL) methods often rely on massive exploration da...
06/16/2018

Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents

We investigate the task of learning to follow natural language instructi...
09/15/2019

Model Based Planning with Energy Based Models

Model-based planning holds great promise for improving both sample effic...
09/29/2020

Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network

Existing exploration strategies in reinforcement learning (RL) often eit...