Diverse Exploration via Conjugate Policies for Policy Gradient Methods

by   Andrew Cohen, et al.
Binghamton University

We address the challenge of effective exploration while maintaining good performance in policy gradient methods. As a solution, we propose diverse exploration (DE) via conjugate policies. DE learns and deploys a set of conjugate policies which can be conveniently generated as a byproduct of conjugate gradient descent. We provide both theoretical and empirical results showing the effectiveness of DE at achieving exploration, improving policy performance, and the advantage of DE over exploration by random policy perturbations.


page 1

page 2

page 3

page 4


Diverse Exploration for Fast and Safe Policy Improvement

We study an important yet under-addressed problem of quickly and safely ...

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

Policy gradient methods have shown success in learning control policies ...

Learning walk and trot from the same objective using different types of exploration

In quadruped gait learning, policy search methods that scale high dimens...

Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies

Standard reinforcement learning methods aim to master one way of solving...

ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search

A tree-based online search algorithm iteratively simulates trajectories ...

Curious Explorer: a provable exploration strategy in Policy Learning

Having access to an exploring restart distribution (the so-called wide c...

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

We present Revel, a partially neural reinforcement learning (RL) framewo...

Please sign up or login with your details

Forgot password? Click here to reset