Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search

by   Li-Cheng Lan, et al.

Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but also requires enormous amounts of CPU and GPU resources. However, not all states require a long searching time to identify the best action that the agent can find. For example, in 19x19 Go and NoGo, we found that for more than half of the states, the best action predicted by DNN remains unchanged even after searching 2 minutes. This implies that a significant amount of resources can be saved if we are able to stop the searching earlier when we are confident with the current searching result. In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero 2.5 times faster while maintaining a similar winning rate. Also, under the same average simulation count, our method can achieve a 61 original program.


AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search

We present AlphaX, a fully automated agent that designs complex neural a...

Batch Monte Carlo Tree Search

Making inferences with a deep neural network on a batch of states is muc...

Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft

Deep reinforcement learning has been successfully applied to several vis...

NSGZero: Efficiently Learning Non-Exploitable Policy in Large-Scale Network Security Games with Neural Monte Carlo Tree Search

How resources are deployed to secure critical targets in networks can be...

Practical Large-Scale Distributed Parallel Monte-Carlo Tree Search Applied to Molecular Design

It is common practice to use large computational resources to train neur...

HyP-DESPOT: A Hybrid Parallel Algorithm for Online Planning under Uncertainty

Planning under uncertainty is critical for robust robot performance in u...

A Further Generalization of the Finite-Population Geiringer-like Theorem for POMDPs to Allow Recombination Over Arbitrary Set Covers

A popular current research trend deals with expanding the Monte-Carlo tr...