Generalized Mean Estimation in Monte-Carlo Tree Search

by   Tuan Dam, et al.

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Moreover, we discuss a heuristic approach to balance the greediness of backups by tuning the power mean operator according to the number of visits to each node. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. UCT.


page 1

page 2

page 3

page 4


Monte-Carlo tree search with uncertainty propagation via optimal transport

This paper introduces a novel backup strategy for Monte-Carlo Tree Searc...

Measurable Monte Carlo Search Error Bounds

Monte Carlo planners can often return sub-optimal actions, even if they ...

Monte Carlo Information-Oriented Planning

In this article, we discuss how to solve information-gathering problems ...

Fixed Points of the Set-Based Bellman Operator

Motivated by uncertain parameters encountered in Markov decision process...

Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

Bayes-optimal behavior, while well-defined, is often difficult to achiev...

Monte Carlo Tree Search guided by Symbolic Advice for MDPs

In this paper, we consider the online computation of a strategy that aim...

On Reinforcement Learning Using Monte Carlo Tree Search with Supervised Learning: Non-Asymptotic Analysis

Inspired by the success of AlphaGo Zero (AGZ) which utilizes Monte Carlo...

Please sign up or login with your details

Forgot password? Click here to reset