Bandit Algorithms for Tree Search

08/09/2014
by   Pierre-Arnuad Coquelin, et al.
0

Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to re- turn rapidly a good value, and improve preci- sion if more time is provided. The UCT algo- rithm [8], a tree search method based on Up- per Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is "over-optimistic" in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT us- ing a confidence sequence that scales expo- nentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account ac- tual smoothness of the rewards for perform- ing efficient "cuts" of sub-optimal branches with high confidence. Finally, we present an incremental tree expansion which applies when the full tree is too big (possibly in- finite) to be entirely represented and show that with high probability, only the optimal branches are indefinitely developed. We illus- trate these methods on a global optimization problem of a continuous function, given noisy values.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2014

Online Stochastic Optimization under Correlated Bandit Feedback

In this paper we consider the problem of online stochastic optimization ...
research
02/15/2016

Maximin Action Identification: A New Bandit Framework for Games

We study an original problem of pure exploration in a strategic bandit m...
research
02/19/2020

Optimistic Policy Optimization with Bandit Feedback

Policy optimization methods are one of the most widely used classes of R...
research
12/03/2019

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

We consider the problem of learning in episodic finite-horizon Markov de...
research
06/09/2017

Monte-Carlo Tree Search by Best Arm Identification

Recent advances in bandit tools and techniques for sequential learning a...
research
04/25/2019

Lipschitz Bandit Optimization with Improved Efficiency

We consider the Lipschitz bandit optimization problem with an emphasis o...
research
10/07/2020

Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization

Bayesian bandits using Thompson Sampling have seen increasing success in...

Please sign up or login with your details

Forgot password? Click here to reset