DeepAI AI Chat
Log In Sign Up

Monte-Carlo Tree Search as Regularized Policy Optimization

by   Jean-Bastien Grill, et al.

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.


page 1

page 2

page 3

page 4


Solve Traveling Salesman Problem by Monte Carlo Tree Search and Deep Neural Network

We present a self-learning approach that combines deep reinforcement lea...

Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search

The combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement...

A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem

Search is a central problem in artificial intelligence, and BFS and DFS ...

Automated Machine Learning with Monte-Carlo Tree Search (Extended Version)

The AutoML task consists of selecting the proper algorithm in a machine ...

Efficient Object Manipulation Planning with Monte Carlo Tree Search

This paper presents an efficient approach to object manipulation plannin...

Monte Carlo Tree Search for high precision manufacturing

Monte Carlo Tree Search (MCTS) has shown its strength for a lot of deter...

Learning Robust Scheduling with Search and Attention

Allocating physical layer resources to users based on channel quality, b...

Code Repositories


Alphazero on GPU thanks to CUDA.jl

view repo


A NNUE Othello engine

view repo


A rust implementation of AlphaZero algorithm

view repo