Multi-Path Policy Optimization

11/11/2019
by   Ling Pan, et al.
0

Recent years have witnessed a tremendous improvement of deep reinforcement learning. However, a challenging problem is that an agent may suffer from inefficient exploration, particularly for on-policy methods. Previous exploration methods either rely on complex structure to estimate the novelty of states, or incur sensitive hyper-parameters causing instability. In this paper, we propose an efficient exploration method, Multi-Path Policy Optimization (MPPO), which does not incur high computation cost and ensures stability. MPPO maintains an efficient mechanism that effectively utilizes an ensemble of diverse policies to enable better exploration, especially in sparse environments. We build our scheme upon two widely-adopted on-policy methods, the Trust-Region Policy Optimization (TRPO) algorithm and Proximal Policy Optimization (PPO) algorithm. We conduct extensive experiments on several MuJoCo tasks and their sparsified variants to fairly evaluate the proposed method. Results show that MPPO significantly outperforms state-of-the-art exploration methods and ensemble methods in terms of both sample efficiency and final performance.

READ FULL TEXT
research
01/29/2019

Trust Region-Guided Proximal Policy Optimization

Model-free reinforcement learning relies heavily on a safe yet explorato...
research
11/11/2020

Proximal Policy Optimization via Enhanced Exploration Efficiency

Proximal policy optimization (PPO) algorithm is a deep reinforcement lea...
research
03/09/2020

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization...
research
07/13/2021

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

We propose the k-Shortest-Path (k-SP) constraint: a novel constraint on ...
research
10/29/2021

Generalized Proximal Policy Optimization with Sample Reuse

In real-world decision making tasks, it is critical for data-driven rein...
research
05/31/2021

Efficient Hierarchical Exploration with Stable Subgoal Representation Learning

Goal-conditioned hierarchical reinforcement learning (HRL) serves as a s...
research
06/19/2020

NROWAN-DQN: A Stable Noisy Network with Noise Reduction and Online Weight Adjustment for Exploration

Deep reinforcement learning has been applied more and more widely nowada...

Please sign up or login with your details

Forgot password? Click here to reset