DeepAI AI Chat
Log In Sign Up

Selecting Computations: Theory and Applications

by   Nicholas Hay, et al.

Sequential decision problems are often approximately solvable by simulating possible future action sequences. Metalevel decision procedures have been developed for selecting which action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian selection problems, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive conjecture that an optimal policy will necessarily reach a decision in all cases. We then derive heuristic approximations in both Bayesian and distribution-free settings and demonstrate their superiority to bandit-based heuristics in one-shot decision problems and in Go.


page 1

page 2

page 3

page 4


Asymptotically Optimal Sampling Policy for Selecting Top-m Alternatives

We consider selecting the top-m alternatives from a finite number of alt...

Measurable Monte Carlo Search Error Bounds

Monte Carlo planners can often return sub-optimal actions, even if they ...

Limited depth bandit-based strategy for Monte Carlo planning in continuous action spaces

This paper addresses the problem of optimal control using search trees. ...

Sequential Monte Carlo Bandits

In this paper we propose a flexible and efficient framework for handling...

Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Monte Carlo Tree Search (MCTS), most famously used in game-play artifici...

Monte Carlo Tree Search with Scalable Simulation Periods for Continuously Running Tasks

Monte Carlo Tree Search (MCTS) is particularly adapted to domains where ...

Towards Understanding the Effects of Evolving the MCTS UCT Selection Policy

Monte Carlo Tree Search (MCTS) is a sampling best-first method to search...