Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization

04/04/2022
by   Zihan Zhou, et al.
0

We present Reward-Switching Policy Optimization (RSPO), a paradigm to discover diverse strategies in complex RL environments by iteratively finding novel policies that are both locally optimal and sufficiently different from existing ones. To encourage the learning policy to consistently converge towards a previously undiscovered local optimum, RSPO switches between extrinsic and intrinsic rewards via a trajectory-based novelty measurement during the optimization process. When a sampled trajectory is sufficiently distinct, RSPO performs standard policy optimization with extrinsic rewards. For trajectories with high likelihood under existing policies, RSPO utilizes an intrinsic diversity reward to promote exploration. Experiments show that RSPO is able to discover a wide spectrum of strategies in a variety of domains, ranging from single-agent particle-world tasks and MuJoCo continuous control to multi-agent stag-hunt games and StarCraftII challenges.

READ FULL TEXT

page 13

page 29

page 30

research
11/14/2022

Redeeming Intrinsic Rewards via Constrained Optimization

State-of-the-art reinforcement learning (RL) algorithms typically use ra...
research
10/29/2022

Curiosity-Driven Multi-Agent Exploration with Mixed Objectives

Intrinsic rewards have been increasingly used to mitigate the sparse rew...
research
07/20/2023

Reparameterized Policy Learning for Multimodal Trajectory Optimization

We investigate the challenge of parametrizing policies for reinforcement...
research
03/02/2022

Learning in Sparse Rewards settings through Quality-Diversity algorithms

In the Reinforcement Learning (RL) framework, the learning is guided thr...
research
05/04/2023

IMAP: Intrinsically Motivated Adversarial Policy

Reinforcement learning (RL) agents are known to be vulnerable to evasion...
research
12/07/2022

Curiosity creates Diversity in Policy Search

When searching for policies, reward-sparse environments often lack suffi...
research
12/18/2017

'Indifference' methods for managing agent rewards

Indifference is a class of methods that are used to control a reward bas...

Please sign up or login with your details

Forgot password? Click here to reset