Stable Opponent Shaping in Differentiable Games

11/20/2018
by   Alistair Letcher, et al.
75

A growing number of learning methods are actually games which optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL. Opponent shaping is a powerful approach to improve learning dynamics in such games, accounting for the fact that the 'environment' includes agents adapting to one another's updates. Learning with Opponent-Learning Awareness (LOLA) is a recent algorithm which exploits this dynamic response and encourages cooperation in settings like the Iterated Prisoner's Dilemma. Although experimentally successful, we show that LOLA can exhibit 'arrogant' behaviour directly at odds with convergence. In fact, remarkably few algorithms have theoretical guarantees applying across all differentiable games. In this paper we present Stable Opponent Shaping (SOS), a new method that interpolates between LOLA and a stable variant named LookAhead. We prove that LookAhead locally converges and avoids strict saddles in all differentiable games, the strongest results in the field so far. SOS inherits these desirable guarantees, while also shaping the learning of opponents and consistently either matching or outperforming LOLA experimentally.

READ FULL TEXT

page 8

page 20

research
11/23/2021

Independent Learning in Stochastic Games

Reinforcement learning (RL) has recently achieved tremendous successes i...
research
07/26/2023

Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

The behaviour of multi-agent learning in competitive settings is often c...
research
06/24/2021

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

The interplay between exploration and exploitation in competitive multi-...
research
05/13/2019

Differentiable Game Mechanics

Deep learning is built on the foundational guarantee that gradient desce...
research
09/28/2020

Agent Environment Cycle Games

Partially Observable Stochastic Games (POSGs), are the most general mode...
research
07/15/2020

Newton-based Policy Optimization for Games

Many learning problems involve multiple agents optimizing different inte...
research
03/08/2022

COLA: Consistent Learning with Opponent-Learning Awareness

Learning in general-sum games can be unstable and often leads to sociall...

Please sign up or login with your details

Forgot password? Click here to reset