Prosocial learning agents solve generalized Stag Hunts better than selfish ones

09/08/2017
by   Alexander Peysakhovich, et al.
0

Deep reinforcement learning has become an important paradigm for constructing agents that can enter complex multi-agent situations and improve their policies through experience. One commonly used technique is reactive training - applying standard RL methods while treating other agents as a part of the learner's environment. It is known that in general-sum games reactive training can lead groups of agents to converge to inefficient outcomes. We focus on one such class of environments: Stag Hunt games. Here agents either choose a risky cooperative policy (which leads to high payoffs if both choose it but low payoffs to an agent who attempts it alone) or a safe one (which leads to a safe payoff no matter what). We ask how we can change the learning rule of a single agent to improve its outcomes in Stag Hunts that include other reactive learners. We extend existing work on reward-shaping in multi-agent reinforcement learning and show that that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes. Thus, even if we control a single agent in a group making that agent prosocial can increase our agent's long-run payoff. We show experimentally that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2020

Parameter Sharing is Surprisingly Useful for Multi-Agent Deep Reinforcement Learning

"Nonstationarity" is a fundamental problem in cooperative multi-agent re...
research
11/09/2022

Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics

WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of th...
research
09/15/2019

Cognitive swarming in complex environments with attractor dynamics and oscillatory computing

Neurobiological theories of spatial cognition developed with respect to ...
research
04/10/2017

Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning

In reinforcement learning, agents learn by performing actions and observ...
research
06/15/2020

Pessimism About Unknown Unknowns Inspires Conservatism

If we could define the set of all bad outcomes, we could hard-code an ag...
research
10/16/2020

Decentralized Multi-Agent Pursuit using Deep Reinforcement Learning

Pursuit-evasion is the problem of capturing mobile targets with one or m...
research
03/14/2023

Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games

In general-sum games, the interaction of self-interested learning agents...

Please sign up or login with your details

Forgot password? Click here to reset