Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning

12/04/2019
by   Hengyuan Hu, et al.
17

In recent years we have seen fast progress on a number of benchmark problems in AI, with modern methods achieving near or super human performance in Go, Poker and Dota. One common aspect of all of these challenges is that they are by design adversarial or, technically speaking, zero-sum. In contrast to these settings, success in the real world commonly requires humans to collaborate and communicate with others, in settings that are, at least partially, cooperative. In the last year, the card game Hanabi has been established as a new benchmark environment for AI to fill this gap. In particular, Hanabi is interesting to humans since it is entirely focused on theory of mind, i.e., the ability to effectively reason over the intentions, beliefs and point of view of other agents when observing their actions. Learning to be informative when observed by others is an interesting challenge for Reinforcement Learning (RL): Fundamentally, RL requires agents to explore in order to discover good policies. However, when done naively, this randomness will inherently make their actions less informative to others during training. We present a new deep multi-agent RL method, the Simplified Action Decoder (SAD), which resolves this contradiction exploiting the centralized training phase. During training SAD allows other agents to not only observe the (exploratory) action chosen, but agents instead also observe the greedy action of their team mates. By combining this simple intuition with best practices for multi-agent learning, SAD establishes a new SOTA for learning methods for 2-5 players on the self-play part of the Hanabi challenge. Our ablations show the contributions of SAD compared with the best practice components. All of our code and trained agents are available at https://github.com/facebookresearch/Hanabi_SAD.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/11/2019

The StarCraft Multi-Agent Challenge

In the last few years, deep multi-agent reinforcement learning (RL) has ...
11/04/2018

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

When observing the actions of others, humans carry out inferences about ...
03/04/2021

Continuous Coordination As a Realistic Scenario for Lifelong Learning

Current deep reinforcement learning (RL) algorithms are still highly tas...
12/05/2019

Improving Policies via Search in Cooperative Partially Observable Games

Recent superhuman results in games have largely been achieved in a varie...
10/05/2021

Thinking Fast and Slow in AI: the Role of Metacognition

AI systems have seen dramatic advancement in recent years, bringing many...
04/27/2021

SocialAI 0.1: Towards a Benchmark to Stimulate Research on Socio-Cognitive Abilities in Deep Reinforcement Learning Agents

Building embodied autonomous agents capable of participating in social i...
10/31/2020

FireCommander: An Interactive, Probabilistic Multi-agent Environment for Joint Perception-Action Tasks

The purpose of this tutorial is to help individuals use the FireCommande...

Code Repositories

Hanabi_SPARTA

Research code implementing the search AI agent for Hanabi, as well as a web server so people can play against it


view repo

hanabi_SAD

Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning


view repo