Model-Free Opponent Shaping

05/03/2022
by   Chris Lu, et al.
0

In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent's differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Shaping (M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of the underlying ("inner") game. The meta-state consists of the inner policies, and the meta-policy produces a new inner policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent shaping. Empirically, M-FOS near-optimally exploits naive learners and other, more sophisticated algorithms from the literature. For example, to the best of our knowledge, it is the first method to learn the well-known Zero-Determinant (ZD) extortion strategy in the IPD. In the same settings, M-FOS leads to socially optimal outcomes under meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2023

Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games

In general-sum games, the interaction of self-interested learning agents...
research
07/17/2023

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Gradient-based learning in multi-agent systems is difficult because the ...
research
03/08/2022

COLA: Consistent Learning with Opponent-Learning Awareness

Learning in general-sum games can be unstable and often leads to sociall...
research
06/29/2022

Visual Foresight With a Local Dynamics Model

Model-free policy learning has been shown to be capable of learning mani...
research
01/11/2021

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

Agents that interact with other agents often do not know a priori what t...
research
10/30/2021

One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning

Self-tuning algorithms that adapt the learning process online encourage ...

Please sign up or login with your details

Forgot password? Click here to reset