Leela Zero Score: a Study of a Score-based AlphaGo Zero

01/31/2022
by   Luca Pasqualini, et al.
0

AlphaGo, AlphaGo Zero, and all of their derivatives can play with superhuman strength because they are able to predict the win-lose outcome with great accuracy. However, Go as a game is decided by a final score difference, and in final positions AlphaGo plays suboptimal moves: this is not surprising, since AlphaGo is completely unaware of the final score difference, all winning final positions being equivalent from the winrate perspective. This can be an issue, for instance when trying to learn the "best" move or to play with an initial handicap. Moreover, there is the theoretical quest of the "perfect game", that is, the minimax solution. Thus, a natural question arises: is it possible to train a successful Reinforcement Learning agent to predict score differences instead of winrates? No empirical or theoretical evidence can be found in the literature to support the folklore statement that "this does not work". In this paper we present Leela Zero Score, a software designed to support or disprove the "does not work" statement. Leela Zero Score is designed on the open-source solution known as Leela Zero, and is trained on a 9x9 board to predict score differences instead of winrates. We find that the training produces a rational player, and we analyze its style against a strong amateur human player, to find that it is prone to some mistakes when the outcome is close. We compare its strength against SAI, an AlphaGo Zero-like software working on the 9x9 board, and find that the training of Leela Zero Score has reached a premature convergence to a player weaker than SAI.

READ FULL TEXT
research
09/11/2018

SAI, a Sensible Artificial Intelligence that plays Go

We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero p...
research
05/26/2019

SAI: a Sensible Artificial Intelligence that plays with handicap and targets high scores in 9x9 Go (extended version)

We develop a new model that can be applied to any perfect information tw...
research
10/15/2022

Reinforcement Learning for ConnectX

ConnectX is a two-player game that generalizes the popular game Connect ...
research
05/17/2019

Mastering the Game of Sungka from Random Play

Recent work in reinforcement learning demonstrated that learning solely ...
research
09/03/2020

Derived metrics for the game of Go – intrinsic network strength assessment and cheat-detection

The widespread availability of superhuman AI engines is changing how we ...
research
11/17/2017

Learning to Play Othello with Deep Neural Networks

Achieving superhuman playing level by AlphaGo corroborated the capabilit...
research
02/17/2016

11 x 11 Domineering is Solved: The first player wins

We have developed a program called MUDoS (Maastricht University Domineer...

Please sign up or login with your details

Forgot password? Click here to reset