DeepAI AI Chat
Log In Sign Up

Leela Zero Score: a Study of a Score-based AlphaGo Zero

by   Luca Pasqualini, et al.
DART, Faculty of Architecture of Pescara, Italy
Università di Siena
Consiglio Nazionale delle Ricerche

AlphaGo, AlphaGo Zero, and all of their derivatives can play with superhuman strength because they are able to predict the win-lose outcome with great accuracy. However, Go as a game is decided by a final score difference, and in final positions AlphaGo plays suboptimal moves: this is not surprising, since AlphaGo is completely unaware of the final score difference, all winning final positions being equivalent from the winrate perspective. This can be an issue, for instance when trying to learn the "best" move or to play with an initial handicap. Moreover, there is the theoretical quest of the "perfect game", that is, the minimax solution. Thus, a natural question arises: is it possible to train a successful Reinforcement Learning agent to predict score differences instead of winrates? No empirical or theoretical evidence can be found in the literature to support the folklore statement that "this does not work". In this paper we present Leela Zero Score, a software designed to support or disprove the "does not work" statement. Leela Zero Score is designed on the open-source solution known as Leela Zero, and is trained on a 9x9 board to predict score differences instead of winrates. We find that the training produces a rational player, and we analyze its style against a strong amateur human player, to find that it is prone to some mistakes when the outcome is close. We compare its strength against SAI, an AlphaGo Zero-like software working on the 9x9 board, and find that the training of Leela Zero Score has reached a premature convergence to a player weaker than SAI.


SAI, a Sensible Artificial Intelligence that plays Go

We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero p...

Reinforcement Learning for ConnectX

ConnectX is a two-player game that generalizes the popular game Connect ...

Mastering the Game of Sungka from Random Play

Recent work in reinforcement learning demonstrated that learning solely ...

Derived metrics for the game of Go – intrinsic network strength assessment and cheat-detection

The widespread availability of superhuman AI engines is changing how we ...

11 x 11 Domineering is Solved: The first player wins

We have developed a program called MUDoS (Maastricht University Domineer...

Evolution of Neural Networks to Play the Game of Dots-and-Boxes

Dots-and-Boxes is a child's game which remains analytically unsolved. We...