DeepAI AI Chat
Log In Sign Up

Learning with Opponent-Learning Awareness

by   Jakob N. Foerster, et al.
University of Oxford

Multi-agent settings are quickly gathering importance in machine learning. Beyond a plethora of recent work on deep multi-agent reinforcement learning, hierarchical reinforcement learning, generative adversarial networks and decentralized optimization can all be seen as instances of this setting. However, the presence of multiple learning agents in these settings renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method that reasons about the anticipated learning of the other agents. The LOLA learning rule includes an additional term that accounts for the impact of the agent's policy on the anticipated parameter update of the other agents. We show that the LOLA update rule can be efficiently calculated using an extension of the likelihood ratio policy gradient update, making the method suitable for model-free RL. This method thus scales to large parameter and input spaces and nonlinear function approximators. Preliminary results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma (IPD), while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to infinitely repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents can successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also apply LOLA to a grid world task with an embedded social dilemma using deep recurrent policies. Again, by considering the learning of the other agent, LOLA agents learn to cooperate out of selfish interests.


page 1

page 2

page 3

page 4


Meta-Value Learning: a General Framework for Learning with Learning Awareness

Gradient-based learning in multi-agent systems is difficult because the ...

Status-quo policy gradient in Multi-Agent Reinforcement Learning

Individual rationality, which involves maximizing expected individual re...

Learning in two-player games between transparent opponents

We consider a scenario in which two reinforcement learning agents repeat...

COLA: Consistent Learning with Opponent-Learning Awareness

Learning in general-sum games can be unstable and often leads to sociall...

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

Learning anticipation in Multi-Agent Reinforcement Learning (MARL) is a ...

Iterative Update and Unified Representation for Multi-Agent Reinforcement Learning

Multi-agent systems have a wide range of applications in cooperative and...