Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies

05/31/2019
by   Muhammad A. Masood, et al.
0

Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently explore the space of reasonable solutions. Unfortunately, existing approaches that quantify uncertainty over policies are not ultimately relevant to finding policies with qualitatively distinct behaviors. In this work, we formalize the difference between policies as a difference between the distribution of trajectories induced by each policy, which encourages diversity with respect to both state visitation and action choices. We derive a gradient-based optimization technique that can be combined with existing policy gradient methods to now identify diverse collections of well-performing policies. We demonstrate our approach on benchmarks and a healthcare task.

READ FULL TEXT
research
08/28/2023

Policy Diversity for Cooperative Agents

Standard cooperative multi-agent reinforcement learning (MARL) methods a...
research
02/10/2019

Diverse Exploration via Conjugate Policies for Policy Gradient Methods

We address the challenge of effective exploration while maintaining good...
research
06/27/2019

Quantile Regression Deep Reinforcement Learning

Policy gradient based reinforcement learning algorithms coupled with neu...
research
01/31/2020

Preventing Imitation Learning with Adversarial Policy Ensembles

Imitation learning can reproduce policies by observing experts, which po...
research
12/27/2022

Almost-Bayesian Quadratic Persuasion (Extended Version)

In this article, we relax the Bayesianity assumption in the now-traditio...
research
02/23/2023

Diverse Policy Optimization for Structured Action Space

Enhancing the diversity of policies is beneficial for robustness, explor...
research
03/27/2023

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

In the context of neuroevolution, Quality-Diversity algorithms have prov...

Please sign up or login with your details

Forgot password? Click here to reset