Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

by   Julien Perolat, et al.

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of 10^535 nodes, i.e., 10^175 times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of 10^164 nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.


page 4

page 11

page 16

page 18

page 33


Suphx: Mastering Mahjong with Deep Reinforcement Learning

Artificial Intelligence (AI) has achieved great success in many domains,...

Expert Human-Level Driving in Gran Turismo Sport Using Deep Reinforcement Learning with Image-based Representation

When humans play virtual racing games, they use visual environmental inf...

Using Graph Convolutional Networks and TD(λ) to play the game of Risk

Risk is 6 player game with significant randomness and a large game-tree ...

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Researchers on artificial intelligence have achieved human-level intelli...

On the complexity of Dark Chinese Chess

This paper provides a complexity analysis for the game of dark Chinese c...

Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences

Game theoretic views of convention generally rest on notions of common k...

Please sign up or login with your details

Forgot password? Click here to reset