AlphaSnake: Policy Iteration on a Nondeterministic NP-hard Markov Decision Process

11/17/2022
by   Kevin Du, et al.
0

Reinforcement learning has recently been used to approach well-known NP-hard combinatorial problems in graph theory. Among these problems, Hamiltonian cycle problems are exceptionally difficult to analyze, even when restricted to individual instances of structurally complex graphs. In this paper, we use Monte Carlo Tree Search (MCTS), the search algorithm behind many state-of-the-art reinforcement learning algorithms such as AlphaZero, to create autonomous agents that learn to play the game of Snake, a game centered on properties of Hamiltonian cycles on grid graphs. The game of Snake can be formulated as a single-player discounted Markov Decision Process (MDP) where the agent must behave optimally in a stochastic environment. Determining the optimal policy for Snake, defined as the policy that maximizes the probability of winning - or win rate - with higher priority and minimizes the expected number of time steps to win with lower priority, is conjectured to be NP-hard. Performance-wise, compared to prior work in the Snake game, our algorithm is the first to achieve a win rate over 0.5 (a uniform random policy achieves a win rate < 2.57 × 10^-15), demonstrating the versatility of AlphaZero in approaching NP-hard environments.

READ FULL TEXT

page 1

page 2

page 5

research
02/28/2023

Minimizing the Outage Probability in a Markov Decision Process

Standard Markov decision process (MDP) and reinforcement learning algori...
research
07/05/2017

Learning to Design Games: Strategic Environments in Deep Reinforcement Learning

In typical reinforcement learning (RL), the environment is assumed given...
research
05/28/2019

Solving NP-Hard Problems on Graphs by Reinforcement Learning without Domain Knowledge

We propose an algorithm based on reinforcement learning for solving NP-h...
research
09/30/2021

Learning the Markov Decision Process in the Sparse Gaussian Elimination

We propose a learning-based approach for the sparse Gaussian Elimination...
research
07/09/2021

Most Classic Problems Remain NP-hard on Relative Neighborhood Graphs and their Relatives

Proximity graphs have been studied for several decades, motivated by app...
research
04/17/2021

3-Coloring on Regular, Planar, and Ordered Hamiltonian Graphs

We prove that 3-Coloring remains NP-hard on 4- and 5-regular planar Hami...
research
04/13/2020

K-spin Hamiltonian for quantum-resolvable Markov decision processes

The Markov decision process is the mathematical formalization underlying...

Please sign up or login with your details

Forgot password? Click here to reset