Warm-Start AlphaZero Self-Play Search Enhancements

04/26/2020
by   Hui Wang, et al.
4

Recently, AlphaZero has achieved landmark results in deep reinforcement learning, by providing a single self-play architecture that learned three different games at super human level. AlphaZero is a large and complicated system with many parameters, and success requires much compute power and fine-tuning. Reproducing results in other games is a challenge, and many researchers are looking for ways to improve results while reducing computational demands. AlphaZero's design is purely based on self-play and makes no use of labeled expert data ordomain specific enhancements; it is designed to learn from scratch. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games, with especially RAVE based variants playing strongly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2021

Adaptive Warm-Start MCTS in AlphaZero-like Deep Reinforcement Learning

AlphaZero has achieved impressive performance in deep reinforcement lear...
research
07/18/2021

Train on Small, Play the Large: Scaling Up Board Games with AlphaZero and GNN

Playing board games is considered a major challenge for both humans and ...
research
07/04/2018

Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

Adversarial self-play in two-player games has delivered impressive resul...
research
04/28/2022

AlphaZero-Inspired General Board Game Learning and Playing

Recently, the seminal algorithms AlphaGo and AlphaZero have started a ne...
research
02/24/2021

Transfer of Fully Convolutional Policy-Value Networks Between Games and Game Variants

In this paper, we use fully convolutional architectures in AlphaZero-lik...
research
03/12/2020

Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?

The landmark achievements of AlphaGo Zero have created great research in...
research
06/08/2020

A Comparison of Self-Play Algorithms Under a Generalized Framework

Throughout scientific history, overarching theoretical frameworks have a...

Please sign up or login with your details

Forgot password? Click here to reset