Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

04/07/2019
by   Thomas Anthony, et al.
4

Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by state-of-the-art programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with Monte Carlo averages, stored in a search tree; this does not scale to games with very high branching factors. We propose an alternative simulation-based search method, Policy Gradient Search (PGS), which adapts a neural network simulation policy online via policy gradient updates, avoiding the need for a search tree. In Hex, PGS achieves comparable performance to MCTS, and an agent trained using Expert Iteration with PGS was able defeat MoHex 2.0, the strongest open-source Hex agent, in 9x9 Hex.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2022

ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search

A tree-based online search algorithm iteratively simulates trajectories ...
research
05/30/2020

Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration

Expert Iteration (ExIt) is an effective framework for learning game-play...
research
02/13/2018

Learning to Search with MCTSnets

Planning problems are among the most important and well-studied problems...
research
05/14/2019

Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

In recent years, state-of-the-art game-playing agents often involve poli...
research
05/06/2020

Learning, transferring, and recommending performance knowledge with Monte Carlo tree search and neural networks

Making changes to a program to optimize its performance is an unscalable...
research
04/20/2021

Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments

We present a self-improving, neural tree expansion method for multi-robo...
research
07/28/2021

Monte Carlo Tree Search for high precision manufacturing

Monte Carlo Tree Search (MCTS) has shown its strength for a lot of deter...

Please sign up or login with your details

Forgot password? Click here to reset