VOI-aware MCTS

07/24/2012
by   David Tolpin, et al.
0

UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in games and Markov decision processes, is based on UCB1, a sampling policy for the Multi-armed Bandit problem (MAB) that minimizes the cumulative regret. However, search differs from MAB in that in MCTS it is usually only the final "arm pull" (the actual move selection) that collects a reward, rather than all "arm pulls". In this paper, an MCTS sampling policy based on Value of Information (VOI) estimates of rollouts is suggested. Empirical evaluation of the policy and comparison to UCB1 and UCT is performed on random MAB instances as well as on Computer Go.

READ FULL TEXT

page 1

page 2

research
07/23/2012

MCTS Based on Simple Regret

UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in ...
research
09/06/2019

Efficient Multivariate Bandit Algorithm with Path Planning

In this paper, we solve the arms exponential exploding issue in multivar...
research
11/07/2016

Reinforcement-based Simultaneous Algorithm and its Hyperparameters Selection

Many algorithms for data analysis exist, especially for classification p...
research
10/26/2020

Expert Selection in High-Dimensional Markov Decision Processes

In this work we present a multi-armed bandit framework for online expert...
research
09/19/2023

Monte-Carlo tree search with uncertainty propagation via optimal transport

This paper introduces a novel backup strategy for Monte-Carlo Tree Searc...
research
07/03/2023

Statistical Inference on Multi-armed Bandits with Delayed Feedback

Multi armed bandit (MAB) algorithms have been increasingly used to compl...
research
04/22/2020

Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D

In evolutionary computation, different reproduction operators have vario...

Please sign up or login with your details

Forgot password? Click here to reset