Selecting Computations: Theory and Applications

08/09/2014
by   Nicholas Hay, et al.
0

Sequential decision problems are often approximately solvable by simulating possible future action sequences. Metalevel decision procedures have been developed for selecting which action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian selection problems, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive conjecture that an optimal policy will necessarily reach a decision in all cases. We then derive heuristic approximations in both Bayesian and distribution-free settings and demonstrate their superiority to bandit-based heuristics in one-shot decision problems and in Go.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2021

Asymptotically Optimal Sampling Policy for Selecting Top-m Alternatives

We consider selecting the top-m alternatives from a finite number of alt...
research
06/08/2021

Measurable Monte Carlo Search Error Bounds

Monte Carlo planners can often return sub-optimal actions, even if they ...
research
06/29/2021

Limited depth bandit-based strategy for Monte Carlo planning in continuous action spaces

This paper addresses the problem of optimal control using search trees. ...
research
10/04/2013

Sequential Monte Carlo Bandits

In this paper we propose a flexible and efficient framework for handling...
research
04/20/2017

Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Monte Carlo Tree Search (MCTS), most famously used in game-play artifici...
research
09/07/2018

Monte Carlo Tree Search with Scalable Simulation Periods for Continuously Running Tasks

Monte Carlo Tree Search (MCTS) is particularly adapted to domains where ...
research
02/07/2023

Towards Understanding the Effects of Evolving the MCTS UCT Selection Policy

Monte Carlo Tree Search (MCTS) is a sampling best-first method to search...

Please sign up or login with your details

Forgot password? Click here to reset