Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games

03/14/2023
by   Ziyi Liu, et al.
0

In general-sum games, the interaction of self-interested learning agents commonly leads to socially worse outcomes, such as defect-defect in the iterated stag hunt (ISH). Previous works address this challenge by sharing rewards or shaping their opponents' learning process, which require too strong assumptions. In this paper, we demonstrate that agents trained to optimize expected returns are more likely to choose a safe action that leads to guaranteed but lower rewards. However, there typically exists a risky action that leads to higher rewards in the long run only if agents cooperate, e.g., cooperate-cooperate in ISH. To overcome this, we propose using action value distribution to characterize the decision's risk and corresponding potential payoffs. Specifically, we present Adaptable Risk-Sensitive Policy (ARSP). ARSP learns the distributions over agent's return and estimates a dynamic risk-seeking bonus to discover risky coordination strategies. Furthermore, to avoid overfitting training opponents, ARSP learns an auxiliary opponent modeling task to infer opponents' types and dynamically alter corresponding strategies during execution. Empirically, agents trained via ARSP can achieve stable coordination during training without accessing opponent's rewards or learning process, and can adapt to non-cooperative opponents during execution. To the best of our knowledge, it is the first method to learn coordination strategies between agents both in iterated prisoner's dilemma (IPD) and iterated stag hunt (ISH) without shaping opponents or rewards, and can adapt to opponents with distinct strategies during execution. Furthermore, we show that ARSP can be scaled to high-dimensional settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2022

Learning Generalizable Risk-Sensitive Policies to Coordinate in Decentralized Multi-Agent General-Sum Games

While various multi-agent reinforcement learning methods have been propo...
research
05/03/2022

Model-Free Opponent Shaping

In general-sum games, the interaction of self-interested learning agents...
research
04/13/2023

Power-seeking can be probable and predictive for trained agents

Power-seeking behavior is a key source of risk from advanced AI, but our...
research
02/16/2021

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Current value-based multi-agent reinforcement learning methods optimize ...
research
08/18/2023

Learning in Cooperative Multiagent Systems Using Cognitive and Machine Models

Developing effective Multi-Agent Systems (MAS) is critical for many appl...
research
09/08/2017

Prosocial learning agents solve generalized Stag Hunts better than selfish ones

Deep reinforcement learning has become an important paradigm for constru...
research
04/21/2022

Path-Specific Objectives for Safer Agent Incentives

We present a general framework for training safe agents whose naive ince...

Please sign up or login with your details

Forgot password? Click here to reset