Learning to Share and Hide Intentions using Information Regularization

08/06/2018
by   DJ Strouse, et al.
0

Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning. Typically to do so, one requires access to either a model of or interaction with the other agent(s). Here we show how to learn effective strategies for cooperation and competition in an asymmetric information game with no such model or interaction. Our approach is to encourage an agent to reveal or hide their intentions using an information-theoretic regularizer. We consider both the mutual information between goal and action given state, as well as the mutual information between goal and state. We show how to stochastically optimize these regularizers in a way that is easy to integrate with policy gradient reinforcement learning. Finally, we demonstrate that cooperative (competitive) policies learned with our approach lead to more (less) reward for a second agent in two simple asymmetric information games.

READ FULL TEXT

page 6

page 7

research
10/12/2019

Influence-Based Multi-Agent Exploration

Intrinsically motivated reinforcement learning aims to address the explo...
research
01/20/2022

Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming

Information sharing is key in building team cognition and enables coordi...
research
12/30/2020

Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients

As reinforcement learning techniques are increasingly applied to real-wo...
research
09/18/2019

Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games

This paper presents an algorithmic framework for learning robust policie...
research
11/21/2016

Memory Lens: How Much Memory Does an Agent Use?

We propose a new method to study the internal memory used by reinforceme...
research
01/16/2020

MIME: Mutual Information Minimisation Exploration

We show that reinforcement learning agents that learn by surprise (surpr...
research
05/01/2020

Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for Semi-Cooperative Learning

Smart modular freight containers – as propagated in the Physical Interne...

Please sign up or login with your details

Forgot password? Click here to reset