Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search

05/22/2020
by   Arta Seify, et al.
0

The combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. In this paper, we describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards (which is the case in many optimization problems), 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search. We gauge the effectiveness of our method in "SameGame"—a popular single-player test domain. Our experimental results indicate that our method outperforms baseline algorithms on several board sizes. Additionally, it is competitive with state-of-the-art search algorithms on a public set of positions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2020

Minimax Strikes Back

Deep Reinforcement Learning (DRL) reaches a superhuman level of play in ...
research
05/22/2023

Know your Enemy: Investigating Monte-Carlo Tree Search with Opponent Models in Pommerman

In combination with Reinforcement Learning, Monte-Carlo Tree Search has ...
research
02/24/2021

Combining Off and On-Policy Training in Model-Based Reinforcement Learning

The combination of deep learning and Monte Carlo Tree Search (MCTS) has ...
research
07/24/2020

Monte-Carlo Tree Search as Regularized Policy Optimization

The combination of Monte-Carlo tree search (MCTS) with deep reinforcemen...
research
11/27/2018

Single-Agent Policy Tree Search With Guarantees

We introduce two novel tree search algorithms that use a policy to guide...
research
06/07/2023

Policy-Based Self-Competition for Planning Problems

AlphaZero-type algorithms may stop improving on single-player tasks in c...
research
05/27/2019

Policy Based Inference in Trick-Taking Card Games

Trick-taking card games feature a large amount of private information th...

Please sign up or login with your details

Forgot password? Click here to reset