"Other-Play" for Zero-Shot Coordination

03/06/2020
by   Hengyuan Hu, et al.
0

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.

READ FULL TEXT

page 7

page 8

page 12

research
02/10/2023

Improving Zero-Shot Coordination Performance Based on Policy Similarity

Over these years, multi-agent reinforcement learning has achieved remark...
research
01/28/2022

Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

Cooperative artificial intelligence with human or superhuman proficiency...
research
07/14/2022

K-level Reasoning for Zero-Shot Coordination in Hanabi

The standard problem setting in cooperative multi-agent settings is self...
research
06/11/2021

A New Formalism, Method and Open Issues for Zero-Shot Coordination

In many coordination problems, independently reasoning humans are able t...
research
08/20/2023

Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi

Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Ze...
research
10/21/2022

Equivariant Networks for Zero-Shot Coordination

Successful coordination in Dec-POMDPs requires agents to adopt robust st...
research
04/07/2021

On the Critical Role of Conventions in Adaptive Human-AI Collaboration

Humans can quickly adapt to new partners in collaborative tasks (e.g. pl...

Please sign up or login with your details

Forgot password? Click here to reset