Tackling Morpion Solitaire with AlphaZero-likeRanked Reward Reinforcement Learning

06/14/2020
by   Hui Wang, et al.
0

Morpion Solitaire is a popular single player game, performed with paper and pencil. Due to its large state space (on the order of the game of Go) traditional search algorithms, such as MCTS, have not been able to find good solutions. A later algorithm, Nested Rollout Policy Adaptation, was able to find a new record of 82 steps, albeit with large computational resources. After achieving this record, to the best of our knowledge, there has been no further progress reported, for about a decade. In this paper we take the recent impressive performance of deep self-learning reinforcement learning approaches from AlphaGo/AlphaZero as inspiration to design a searcher for Morpion Solitaire. A challenge of Morpion Solitaire is that the state space is sparse, there are few win/loss signals. Instead, we use an approach known as ranked reward to create a reinforcement learning self-play framework for Morpion Solitaire. This enables us to find medium-quality solutions with reasonable computational effort. Our record is a 67 steps solution, which is very close to the human best (68) without any other adaptation to the problem than using ranked reward. We list many further avenues for potential improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2018

Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

Adversarial self-play in two-player games has delivered impressive resul...
research
11/12/2020

Hierarchical reinforcement learning for efficient exploration and transfer

Sparse-reward domains are challenging for reinforcement learning algorit...
research
05/06/2023

A Novel Reward Shaping Function for Single-Player Mahjong

Mahjong is a complex game with an intractably large state space with ext...
research
04/10/2018

Evaluating Actuators in a Purely Information-Theory Based Reward Model

AGINAO builds its cognitive engine by applying self-programming techniqu...
research
05/17/2021

Learning to Win, Lose and Cooperate through Reward Signal Evolution

Solving a reinforcement learning problem typically involves correctly pr...
research
02/17/2018

A Deep Q-Learning Agent for the L-Game with Variable Batch Training

We employ the Deep Q-Learning algorithm with Experience Replay to train ...

Please sign up or login with your details

Forgot password? Click here to reset