Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

08/18/2011
by   David Tolpin, et al.
0

UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), is based on UCB, a sampling policy for the Multi-armed Bandit Problem (MAB) that minimizes the accumulated regret. However, MCTS differs from MAB in that only the final choice, rather than all arm pulls, brings a reward, that is, the simple regret, as opposite to the cumulative regret, must be minimized. This ongoing work aims at applying meta-reasoning techniques to MCTS, which is non-trivial. We begin by introducing policies for multi-armed bandits with lower simple regret than UCB, and an algorithm for MCTS which combines cumulative and simple regret minimization and outperforms UCT. We also develop a sampling scheme loosely based on a myopic version of perfect value of information. Finite-time and asymptotic analysis of the policies is provided, and the algorithms are compared empirically.

READ FULL TEXT
research
07/23/2012

MCTS Based on Simple Regret

UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in ...
research
07/03/2023

Statistical Inference on Multi-armed Bandits with Delayed Feedback

Multi armed bandit (MAB) algorithms have been increasingly used to compl...
research
06/29/2021

Limited depth bandit-based strategy for Monte Carlo planning in continuous action spaces

This paper addresses the problem of optimal control using search trees. ...
research
09/06/2019

Efficient Multivariate Bandit Algorithm with Path Planning

In this paper, we solve the arms exponential exploding issue in multivar...
research
10/16/2017

On the Hardness of Inventory Management with Censored Demand Data

We consider a repeated newsvendor problem where the inventory manager ha...
research
03/24/2015

A Note on Information-Directed Sampling and Thompson Sampling

This note introduce three Bayesian style Multi-armed bandit algorithms: ...
research
11/15/2022

On Penalization in Stochastic Multi-armed Bandits

We study an important variant of the stochastic multi-armed bandit (MAB)...

Please sign up or login with your details

Forgot password? Click here to reset