Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning

05/16/2023
by   Stephen Wissow, et al.
0

Balancing exploration and exploitation has been an important problem in both game tree search and automated planning. However, while the problem has been extensively analyzed within the Multi-Armed Bandit (MAB) literature, the planning community has had limited success when attempting to apply those results. We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms that are based on Monte Carlo Tree Search (MCTS) / Trial Based Heuristic Tree Search (THTS). In particular, THTS uses UCB1 MAB algorithms in an ad hoc manner, as UCB1's theoretical requirement of fixed bounded support reward distributions is not satisfied within heuristic search for classical planning. The core issue lies in UCB1's lack of adaptations to the different scales of the rewards. We propose GreedyUCT-Normal, a MCTS/THTS algorithm with UCB1-Normal bandit for agile classical planning, which handles distributions with different scales by taking the reward variance into consideration, and resulted in an improved algorithmic performance (more plans found with less node expansions) that outperforms Greedy Best First Search and existing MCTS/THTS-based algorithms (GreedyUCT,GreedyUCT*).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2019

Robust and Adaptive Planning under Model Uncertainty

Planning under model uncertainty is a fundamental problem across many ap...
research
03/09/2020

Convex Hull Monte-Carlo Tree Search

This work investigates Monte-Carlo planning for agents in stochastic env...
research
04/18/2021

Monte Carlo Elites: Quality-Diversity Selection as a Multi-Armed Bandit Problem

A core challenge of evolutionary search is the need to balance between e...
research
07/19/2012

The Road to VEGAS: Guiding the Search over Neutral Networks

VEGAS (Varying Evolvability-Guided Adaptive Search) is a new methodology...
research
09/06/2019

Efficient Multivariate Bandit Algorithm with Path Planning

In this paper, we solve the arms exponential exploding issue in multivar...
research
03/15/2012

Understanding Sampling Style Adversarial Search Methods

UCT has recently emerged as an exciting new adversarial reasoning techni...
research
12/16/2022

Materials Discovery using Max K-Armed Bandit

Search algorithms for the bandit problems are applicable in materials di...

Please sign up or login with your details

Forgot password? Click here to reset