Status-quo policy gradient in Multi-Agent Reinforcement Learning

by   Pinkesh Badjatiya, et al.

Individual rationality, which involves maximizing expected individual returns, does not always lead to high-utility individual or group outcomes in multi-agent problems. For instance, in multi-agent social dilemmas, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to a low-utility mutually harmful equilibrium. In contrast, humans evolve useful strategies in such social dilemmas. Inspired by ideas from human psychology that attribute this behavior to the status-quo bias, we present a status-quo loss (SQLoss) and the corresponding policy gradient algorithm that incorporates this bias in an RL agent. We demonstrate that agents trained with SQLoss learn high-utility policies in several social dilemma matrix games (Prisoner's Dilemma, Stag Hunt matrix variant, Chicken Game). We show how SQLoss outperforms existing state-of-the-art methods to obtain high-utility policies in visual input non-matrix games (Coin Game and Stag Hunt visual input variant) using pre-trained cooperation and defection oracles. Finally, we show that SQLoss extends to a 4-agent setting by demonstrating the emergence of cooperative behavior in the popular Braess' paradox.



There are no comments yet.


page 2


Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

In social dilemma situations, individual rationality leads to sub-optima...

Inducing Cooperation in Multi-Agent Games Through Status-Quo Loss

Social dilemma situations bring out the conflict between individual and ...

Learning with Opponent-Learning Awareness

Multi-agent settings are quickly gathering importance in machine learnin...

Adversarial Socialbot Learning via Multi-Agent Deep Hierarchical Reinforcement Learning

Socialbots are software-driven user accounts on social platforms, acting...

Prosocial learning agents solve generalized Stag Hunts better than selfish ones

Deep reinforcement learning has become an important paradigm for constru...

Multi-Principal Assistance Games: Definition and Collegial Mechanisms

We introduce the concept of a multi-principal assistance game (MPAG), an...

Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for Semi-Cooperative Learning

Smart modular freight containers – as propagated in the Physical Interne...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.