Learning in two-player games between transparent opponents

12/04/2020
by   Adrian Hutter, et al.
0

We consider a scenario in which two reinforcement learning agents repeatedly play a matrix game against each other and update their parameters after each round. The agents' decision-making is transparent to each other, which allows each agent to predict how their opponent will play against them. To prevent an infinite regress of both agents recursively predicting each other indefinitely, each agent is required to give an opponent-independent response with some probability at least epsilon. Transparency also allows each agent to anticipate and shape the other agent's gradient step, i.e. to move to regions of parameter space in which the opponent's gradient points in a direction favourable to them. We study the resulting dynamics experimentally, using two algorithms from previous literature (LOLA and SOS) for opponent-aware learning. We find that the combination of mutually transparent decision-making and opponent-aware learning robustly leads to mutual cooperation in a single-shot prisoner's dilemma. In a game of chicken, in which both agents try to manoeuvre their opponent towards their preferred equilibrium, converging to a mutually beneficial outcome turns out to be much harder, and opponent-aware learning can even lead to worst-case outcomes for both agents. This highlights the need to develop opponent-aware learning algorithms that achieve acceptable outcomes in social dilemmas involving an equilibrium selection problem.

READ FULL TEXT
research
11/26/2022

Similarity-based Cooperation

As machine learning agents act more autonomously in the world, they will...
research
05/26/2022

Logit-Q Learning in Markov Games

We present new independent learning dynamics provably converging to an e...
research
09/13/2017

Learning with Opponent-Learning Awareness

Multi-agent settings are quickly gathering importance in machine learnin...
research
04/09/2023

Higher-Order Uncoupled Dynamics Do Not Lead to Nash Equilibrium – Except When They Do

The framework of multi-agent learning explores the dynamics of how indiv...
research
03/11/2021

A Reinforcement Learning Based Approach to Play Calling in Football

With the vast amount of data collected on football and the growth of com...
research
12/23/2021

Should transparency be (in-)transparent? On monitoring aversion and cooperation in teams

Many modern organisations employ methods which involve monitoring of emp...
research
03/27/2021

Dynamic Information Sharing and Punishment Strategies

In this paper we study the problem of information sharing among rational...

Please sign up or login with your details

Forgot password? Click here to reset