Diverse Exploration via Conjugate Policies for Policy Gradient Methods

02/10/2019
by   Andrew Cohen, et al.
0

We address the challenge of effective exploration while maintaining good performance in policy gradient methods. As a solution, we propose diverse exploration (DE) via conjugate policies. DE learns and deploys a set of conjugate policies which can be conveniently generated as a byproduct of conjugate gradient descent. We provide both theoretical and empirical results showing the effectiveness of DE at achieving exploration, improving policy performance, and the advantage of DE over exploration by random policy perturbations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2018

Diverse Exploration for Fast and Safe Policy Improvement

We study an important yet under-addressed problem of quickly and safely ...
research
07/14/2020

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

Policy gradient methods have shown success in learning control policies ...
research
04/28/2019

Learning walk and trot from the same objective using different types of exploration

In quadruped gait learning, policy search methods that scale high dimens...
research
05/31/2019

Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies

Standard reinforcement learning methods aim to master one way of solving...
research
02/03/2022

ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search

A tree-based online search algorithm iteratively simulates trajectories ...
research
06/29/2021

Curious Explorer: a provable exploration strategy in Policy Learning

Having access to an exploring restart distribution (the so-called wide c...
research
09/26/2020

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

We present Revel, a partially neural reinforcement learning (RL) framewo...

Please sign up or login with your details

Forgot password? Click here to reset