DeepAI
Log In Sign Up

Human-Level Performance in No-Press Diplomacy via Equilibrium Search

10/06/2020
by   Jonathan Gray, et al.
4

Prior AI breakthroughs in complex games have focused on either the purely adversarial or purely cooperative settings. In contrast, Diplomacy is a game of shifting alliances that involves both cooperation and competition. For this reason, Diplomacy has proven to be a formidable research challenge. In this paper we describe an agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via external regret minimization. External regret minimization techniques have been behind previous AI successes in adversarial games, most notably poker, but have not previously been shown to be successful in large-scale games involving cooperation. We show that our agent greatly exceeds the performance of past no-press Diplomacy bots, is unexploitable by expert humans, and achieves a rank of 23 out of 1,128 human players when playing anonymous games on a popular Diplomacy website.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/25/2020

Supervised Learning Achieves Human-Level Performance in MOBA Games: A Case Study of Honor of Kings

We present JueWu-SL, the first supervised-learning-based artificial inte...
10/11/2022

Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

No-press Diplomacy is a complex strategy game involving both cooperation...
11/01/2022

Adversarial Policies Beat Professional-Level Go AIs

We attack the state-of-the-art Go-playing AI system, KataGo, by training...
06/05/2019

Finding Friend and Foe in Multi-Agent Games

Recent breakthroughs in AI for multi-agent games like Go, Poker, and Dot...
05/20/2021

Human-agent coordination in a group formation game

Coordination and cooperation between humans and autonomous agents in coo...
10/06/2021

No-Press Diplomacy from Scratch

Prior AI successes in complex games have largely focused on settings wit...
10/07/2019

Combining No-regret and Q-learning

Counterfactual Regret Minimization (CFR) has found success in settings l...