DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

07/12/2022
by   Wenze Chen, et al.
0

Recent algorithms designed for reinforcement learning tasks focus on finding a single optimal solution. However, in many practical applications, it is important to develop reasonable agents with diverse strategies. In this paper, we propose Diversity-Guided Policy Optimization (DGPO), an on-policy framework for discovering multiple strategies for the same task. Our algorithm uses diversity objectives to guide a latent code conditioned policy to learn a set of diverse strategies in a single training procedure. Specifically, we formalize our algorithm as the combination of a diversity-constrained optimization problem and an extrinsic-reward constrained optimization problem. And we solve the constrained optimization as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently finds diverse strategies in a wide variety of reinforcement learning tasks. We further show that DGPO achieves a higher diversity score and has similar sample complexity and performance compared to other baselines.

READ FULL TEXT

page 7

page 8

research
08/23/2023

Diverse Policies Converge in Reward-free Markov Decision Processe

Reinforcement learning has achieved great success in many decision-makin...
research
11/10/2018

Diversity-Driven Extensible Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (HRL) has recently shown promising a...
research
03/12/2021

Discovering Diverse Solutions in Deep Reinforcement Learning

Reinforcement learning (RL) algorithms are typically limited to learning...
research
02/16/2020

First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning

In reinforcement learning, an agent attempts to learn high-performing be...
research
02/16/2022

How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity

Although traditional optimization methods focus on finding a single opti...
research
05/13/2021

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

In Goal-oriented Reinforcement learning, relabeling the raw goals in pas...
research
05/24/2019

InfoRL: Interpretable Reinforcement Learning using Information Maximization

Recent advances in reinforcement learning have proved that given an envi...

Please sign up or login with your details

Forgot password? Click here to reset