The Update Equivalence Framework for Decision-Time Planning

04/25/2023
by   Samuel Sokota, et al.
0

The process of revising (or constructing) a policy immediately prior to execution – known as decision-time planning – is key to achieving superhuman performance in perfect-information settings like chess and Go. A recent line of work has extended decision-time planning to more general imperfect-information settings, leading to superhuman performance in poker. However, these methods requires considering subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on subgames but rather on the notion of update equivalence. In this framework, decision-time planning algorithms simulate updates of synchronous learning algorithms. This framework enables us to introduce a new family of principled decision-time planning algorithms that do not rely on public information, opening the door to sound and effective decision-time planning in settings with large amounts of non-public information. In experiments, members of this family produce comparable or superior results compared to state-of-the-art approaches in Hanabi and improve performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2023

Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

In their seminal work, Nayyar et al. (2013) showed that imperfect inform...
research
05/03/2015

Metareasoning for Planning Under Uncertainty

The conventional model for online planning under uncertainty assumes tha...
research
04/16/2018

Multimodal Dynamic Journey Planning

We present multimodal DTM, a new model for multimodal journey planning i...
research
02/06/2013

Problem-Focused Incremental Elicitation of Multi-Attribute Utility Models

Decision theory has become widely accepted in the AI community as a usef...
research
05/15/2020

Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning

Planning and reinforcement learning are two key approaches to sequential...
research
04/29/2021

D-VAL: An automatic functional equivalence validation tool for planning domain models

In this paper, we introduce an approach to validate the functional equiv...
research
01/16/2014

On-line Planning and Scheduling: An Application to Controlling Modular Printers

We present a case study of artificial intelligence techniques applied to...

Please sign up or login with your details

Forgot password? Click here to reset