ExpIt-OOS: Towards Learning from Planning in Imperfect Information Games

08/30/2018
by   Andy Kitchen, et al.
0

The current state of the art in playing many important perfect information games, including Chess and Go, combines planning and deep reinforcement learning with self-play. We extend this approach to imperfect information games and present ExIt-OOS, a novel approach to playing imperfect information games within the Expert Iteration framework and inspired by AlphaZero. We use Online Outcome Sampling, an online search algorithm for imperfect information games in place of MCTS. While training online, our neural strategy is used to improve the accuracy of playouts in OOS, allowing a learning and planning feedback loop for imperfect information games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2018

ExIt-OOS: Towards Learning from Planning in Imperfect Information Games

The current state of the art in playing many important perfect informati...
research
12/22/2020

Learning to Play Imperfect-Information Games by Imitating an Oracle Planner

We consider learning to play multiplayer imperfect-information games wit...
research
09/30/2021

Scalable Online Planning via Reinforcement Learning Fine-Tuning

Lookahead search has been a critical component of recent AI successes, s...
research
06/14/2019

Problems with the EFG formalism: a solution attempt using observations

We argue that the extensive-form game (EFG) model isn't powerful enough ...
research
02/06/2021

Improving Model and Search for Computer Go

The standard for Deep Reinforcement Learning in games, following Alpha Z...
research
11/10/2021

Search in Imperfect Information Games

From the very dawn of the field, search with value functions was a funda...
research
07/03/2023

Synthesising Full-Information Protocols

We lay out a model of games with imperfect information that features exp...

Please sign up or login with your details

Forgot password? Click here to reset