Evaluation Beyond Task Performance: Analyzing Concepts in AlphaZero in Hex

11/26/2022
by   Charles Lovering, et al.
0

AlphaZero, an approach to reinforcement learning that couples neural networks and Monte Carlo tree search (MCTS), has produced state-of-the-art strategies for traditional board games like chess, Go, shogi, and Hex. While researchers and game commentators have suggested that AlphaZero uses concepts that humans consider important, it is unclear how these concepts are captured in the network. We investigate AlphaZero's internal representations in the game of Hex using two evaluation techniques from natural language processing (NLP): model probing and behavioral tests. In doing so, we introduce new evaluation tools to the RL community and illustrate how evaluations other than task performance can be used to provide a more complete picture of a model's strengths and weaknesses. Our analyses in the game of Hex reveal interesting patterns and generate some testable hypotheses about how such models learn in general. For example, we find that MCTS discovers concepts before the neural network learns to encode them. We also find that concepts related to short-term end-game planning are best encoded in the final layers of the model, whereas concepts related to long-term planning are encoded in the middle layers of the model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2022

Understanding Game-Playing Agents with Natural Language Annotations

We present a new dataset containing 10K human-annotated games of Go and ...
research
06/27/2022

Analyzing Encoded Concepts in Transformer Language Models

We propose a novel framework ConceptX, to analyze how latent concepts ar...
research
09/27/2020

Playing Carcassonne with Monte Carlo Tree Search

Monte Carlo Tree Search (MCTS) is a relatively new sampling method with ...
research
04/01/2020

A New Challenge: Approaching Tetris Link with AI

Decades of research have been invested in making computer programs for p...
research
12/18/2020

Which Heroes to Pick? Learning to Draft in MOBA Games with Neural Networks and Tree Search

Hero drafting is essential in MOBA game playing as it builds the team of...
research
12/21/2022

On Reinforcement Learning for the Game of 2048

2048 is a single-player stochastic puzzle game. This intriguing and addi...
research
10/24/2022

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

Language models show a surprising range of capabilities, but the source ...

Please sign up or login with your details

Forgot password? Click here to reset