Unbiased Self-Play

06/06/2021
by   Shohei Ohsawa, et al.
0

We present a general optimization framework for emergent belief-state representation without any supervision. We employed the common configuration of multiagent reinforcement learning and communication to improve exploration coverage over an environment by leveraging the knowledge of each agent. In this paper, we obtained that recurrent neural nets (RNNs) with shared weights are highly biased in partially observable environments because of their noncooperativity. To address this, we designated an unbiased version of self-play via mechanism design, also known as reverse game theory, to clarify unbiased knowledge at the Bayesian Nash equilibrium. The key idea is to add imaginary rewards using the peer prediction mechanism, i.e., a mechanism for mutually criticizing information in a decentralized environment. Numerical analyses, including StarCraft exploration tasks with up to 20 agents and off-the-shelf RNNs, demonstrate the state-of-the-art performance.

READ FULL TEXT
research
05/28/2018

Memory Augmented Self-Play

Self-play is an unsupervised training procedure which enables the reinfo...
research
03/03/2016

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Many real-world applications can be described as large-scale games of im...
research
01/22/2021

Theory of Mind for Deep Reinforcement Learning in Hanabi

The partially observable card game Hanabi has recently been proposed as ...
research
11/15/2022

Linear Convergent Distributed Nash Equilibrium Seeking with Compression

Information compression techniques are often employed to reduce communic...
research
10/26/2019

Implicit Posterior Variational Inference for Deep Gaussian Processes

A multi-layer deep Gaussian process (DGP) model is a hierarchical compos...
research
07/05/2020

Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions

This paper seeks to establish a framework for directing a society of sim...
research
01/06/2019

Recurrent Control Nets for Deep Reinforcement Learning

Central Pattern Generators (CPGs) are biological neural circuits capable...

Please sign up or login with your details

Forgot password? Click here to reset