Monte Carlo Tree Search with Scalable Simulation Periods for Continuously Running Tasks

09/07/2018
by   Seydou Ba, et al.
0

Monte Carlo Tree Search (MCTS) is particularly adapted to domains where the potential actions can be represented as a tree of sequential decisions. For an effective action selection, MCTS performs many simulations to build a reliable tree representation of the decision space. As such, a bottleneck to MCTS appears when enough simulations cannot be performed between action selections. This is particularly highlighted in continuously running tasks, for which the time available to perform simulations between actions tends to be limited due to the environment's state constantly changing. In this paper, we present an approach that takes advantage of the anytime characteristic of MCTS to increase the simulation time when allowed. Our approach is to effectively balance the prospect of selecting an action with the time that can be spared to perform MCTS simulations before the next action selection. For that, we considered the simulation time as a decision variable to be selected alongside an action. We extended the Hierarchical Optimistic Optimization applied to Tree (HOOT) method to adapt our approach to environments with a continuous decision space. We evaluated our approach for environments with a continuous decision space through OpenAI gym's Pendulum and Continuous Mountain Car environments and for environments with discrete action space through the arcade learning environment (ALE) platform. The evaluation results show that, with variable simulation times, the proposed approach outperforms the conventional MCTS in the evaluated continuous decision space tasks and improves the performance of MCTS in most of the ALE tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2022

Continuous Monte Carlo Graph Search

In many complex sequential decision making tasks, online planning is cru...
research
10/19/2020

Dream and Search to Control: Latent Space Planning for Continuous Control

Learning and planning with latent space dynamics has been shown to be us...
research
09/20/2019

SPSC: a new execution policy for exploring discrete-time stochastic simulations

In this paper, we introduce a new method called SPSC (Simulation, Partit...
research
09/13/2022

Adaptive Discretization using Voronoi Trees for Continuous-Action POMDPs

Solving Partially Observable Markov Decision Processes (POMDPs) with con...
research
04/20/2017

Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Monte Carlo Tree Search (MCTS), most famously used in game-play artifici...
research
06/08/2021

Vector Quantized Models for Planning

Recent developments in the field of model-based RL have proven successfu...
research
08/09/2014

Selecting Computations: Theory and Applications

Sequential decision problems are often approximately solvable by simulat...

Please sign up or login with your details

Forgot password? Click here to reset