Learning Diverse Risk Preferences in Population-based Self-play

05/19/2023
by   Yuhua Jiang, et al.
0

Among the great successes of Reinforcement Learning (RL), self-play algorithms play an essential role in solving competitive games. Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies, making it often stuck in the local optimum and its strategy style simple and homogeneous. A possible solution is to improve the diversity of policies, which helps the agent break the stalemate and enhances its robustness when facing different opponents. However, enhancing diversity in the self-play algorithms is not trivial. In this paper, we aim to introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty. Specifically, we design a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning and allows for policy learning with desired risk preferences. Seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives with experiences from playing against diverse opponents. Empirical results show that our method achieves comparable or superior performance in competitive games and that diverse modes of behaviors emerge. Our code is public online at <https://github.com/Jackory/RPBT>.

READ FULL TEXT

page 4

page 6

page 16

research
02/10/2020

Provable Self-Play Algorithms for Competitive Reinforcement Learning

Self-play, where the algorithm learns by playing against itself without ...
research
05/16/2023

An Empirical Study on Google Research Football Multi-agent Scenarios

Few multi-agent reinforcement learning (MARL) research on Google Researc...
research
07/13/2022

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

In competitive two-agent environments, deep reinforcement learning (RL) ...
research
02/06/2020

Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning

Recent research on reinforcement learning in pure-conflict and pure-comm...
research
05/26/2017

Risk-Sensitive Cooperative Games for Human-Machine Systems

Autonomous systems can substantially enhance a human's efficiency and ef...
research
12/17/2018

Malthusian Reinforcement Learning

Here we explore a new algorithmic framework for multi-agent reinforcemen...
research
06/08/2020

A Comparison of Self-Play Algorithms Under a Generalized Framework

Throughout scientific history, overarching theoretical frameworks have a...

Please sign up or login with your details

Forgot password? Click here to reset