A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

by   Marc Lanctot, et al.

To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.



There are no comments yet.


page 6

page 20

page 22

page 23


A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

A fundamental challenge in multiagent reinforcement learning is to learn...

Introspection Learning

Traditional reinforcement learning agents learn from experience, past or...

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

This paper introduces two metrics (cycle-based and memory-based metrics)...

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Detection of malicious behavior is a fundamental problem in security. On...

A Policy Efficient Reduction Approach to Convex Constrained Deep Reinforcement Learning

Although well-established in general reinforcement learning (RL), value-...

Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning

Rather than learning new control policies for each new task, it is possi...

A Game-Theoretic Approach for Hierarchical Policy-Making

We present the design and analysis of a multi-level game-theoretic model...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.