Efficient Multivariate Bandit Algorithm with Path Planning

09/06/2019
by   Keyu Nie, et al.
0

In this paper, we solve the arms exponential exploding issue in multivariate Multi-Armed Bandit (Multivariate-MAB) problem when the arm dimension hierarchy is considered. We propose a framework called path planning (TS-PP) which utilizes decision graph/trees to model arm reward success rate with m-way dimension interaction, and adopts Thompson sampling (TS) for heuristic search of arm selection. Naturally, it is quite straightforward to combat the curse of dimensionality using a serial processes that operates sequentially by focusing on one dimension per each process. For our best acknowledge, we are the first to solve Multivariate-MAB problem using graph path planning strategy and deploying alike Monte-Carlo tree search ideas. Our proposed method utilizing tree models has advantages comparing with traditional models such as general linear regression. Simulation studies validate our claim by achieving faster convergence speed, better efficient optimal arm allocation and lower cumulative regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2012

VOI-aware MCTS

UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in ...
research
07/23/2012

MCTS Based on Simple Regret

UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in ...
research
08/18/2011

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), ...
research
12/10/2018

Near-optimal Smooth Path Planning for Multisection Continuum Arms

We study the path planning problem for continuum-arm robots, in which we...
research
08/04/2022

Monte-Carlo Robot Path Planning

Path planning is a crucial algorithmic approach for designing robot beha...
research
05/16/2023

Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning

Balancing exploration and exploitation has been an important problem in ...
research
09/30/2021

Surveillance Evasion Through Bayesian Reinforcement Learning

We consider a 2D continuous path planning problem with a completely unkn...

Please sign up or login with your details

Forgot password? Click here to reset