DeepAI AI Chat
Log In Sign Up

UCT-ADP Progressive Bias Algorithm for Solving Gomoku

by   Xu Cao, et al.

We combine Adaptive Dynamic Programming (ADP), a reinforcement learning method and UCB applied to trees (UCT) algorithm with a more powerful heuristic function based on Progressive Bias method and two pruning strategies for a traditional board game Gomoku. For the Adaptive Dynamic Programming part, we train a shallow forward neural network to give a quick evaluation of Gomoku board situations. UCT is a general approach in MCTS as a tree policy. Our framework use UCT to balance the exploration and exploitation of Gomoku game trees while we also apply powerful pruning strategies and heuristic function to re-select the available 2-adjacent grids of the state and use ADP instead of simulation to give estimated values of expanded nodes. Experiment result shows that this method can eliminate the search depth defect of the simulation process and converge to the correct value faster than single UCT. This approach can be applied to design new Gomoku AI and solve other Gomoku-like board game.


Massively Parallel Dynamic Programming on Trees

Dynamic programming is a powerful technique that is, unfortunately, ofte...

AlphaZero Gomoku

In the past few years, AlphaZero's exceptional capability in mastering i...

Towards solving the 7-in-a-row game

Our paper explores the game theoretic value of the 7-in-a-row game. We r...

Heuristic Dynamic Programming for Adaptive Virtual Synchronous Generators

In this paper a neural network heuristic dynamic programing (HDP) is use...

Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven

In this paper time-driven learning refers to the machine learning method...

Solving the Steiner Tree Problem with few Terminals

The Steiner tree problem is a well-known problem in network design, rout...

Single item stochastic lot sizing problem considering capital flow and business overdraft

This paper introduces capital flow to the single item stochastic lot siz...