Multiplayer AlphaZero

10/29/2019
by   Nick Petosa, et al.
0

The AlphaZero algorithm has achieved superhuman performance in two-player, deterministic, zero-sum games where perfect information of the game state is available. This success has been demonstrated in Chess, Shogi, and Go where learning occurs solely through self-play. Many real-world applications (e.g., equity trading) require the consideration of a multiplayer environment. In this work, we suggest novel modifications of the AlphaZero algorithm to support multiplayer environments, and evaluate the approach in two simple 3-player games. Our experiments show that multiplayer AlphaZero learns successfully and consistently outperforms a competing approach: Monte Carlo tree search. These results suggest that our modified AlphaZero can learn effective strategies in multiplayer game scenarios. Our work supports the use of AlphaZero in multiplayer games and suggests future research for more complex environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2019

ColosseumRL: A Framework for Multiagent Reinforcement Learning in N-Player Games

Much of recent success in multiagent reinforcement learning has been in ...
research
03/27/2018

Accelerating Empowerment Computation with UCT Tree Search

Models of intrinsic motivation present an important means to produce sen...
research
02/08/2023

Learning to Play Stochastic Two-player Perfect-Information Games without Knowledge

In this paper, we extend the Descent framework, which enables learning a...
research
01/01/2022

Zero-Sum Two Person Perfect Information Semi-Markov Games: A Reduction

Look at the play of a Perfect Information Semi-Markov game (PISMG). As t...
research
05/26/2019

SAI: a Sensible Artificial Intelligence that plays with handicap and targets high scores in 9x9 Go (extended version)

We develop a new model that can be applied to any perfect information tw...
research
06/15/2020

Does it matter how well I know what you're thinking? Opponent Modelling in an RTS game

Opponent Modelling tries to predict the future actions of opponents, and...
research
07/10/2023

Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies

This paper introduces Local Learner (2L), an algorithm for providing a s...

Please sign up or login with your details

Forgot password? Click here to reset