Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas

05/01/2023
by   Udari Madhushani, et al.
0

In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others. In reinforcement learning, SVO has been instantiated as an intrinsic motivation that remaps an agent's rewards based on particular target distributions of group reward. Prior studies show that groups of agents endowed with heterogeneous SVO learn diverse policies in settings that resemble the incentive structure of Prisoner's dilemma. Our work extends this body of results and demonstrates that (1) heterogeneous SVO leads to meaningfully diverse policies across a range of incentive structures in sequential social dilemmas, as measured by task-specific diversity metrics; and (2) learning a best response to such policy diversity leads to better zero-shot generalization in some situations. We show that these best-response agents learn policies that are conditioned on their co-players, which we posit is the reason for improved zero-shot generalization results.

READ FULL TEXT

page 4

page 6

research
08/09/2022

Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution

Generating agents that can achieve Zero-Shot Coordination (ZSC) with uns...
research
05/31/2023

Adaptive Coordination in Social Embodied Rearrangement

We present the task of "Social Rearrangement", consisting of cooperative...
research
02/06/2020

Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning

Recent research on reinforcement learning in pure-conflict and pure-comm...
research
01/16/2023

PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination

Zero-shot human-AI coordination holds the promise of collaborating with ...
research
05/13/2019

Diversity and Exploration in Social Learning

In consumer search, there is a set of items. An agent has a prior over h...
research
04/25/2023

Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Acquiring driving policies that can transfer to unseen environments is c...
research
07/25/2017

Dynamic Policies for Cooperative Networked Systems

A set of economic entities embedded in a network graph collaborate by op...

Please sign up or login with your details

Forgot password? Click here to reset