Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning

06/28/2022
by   Dieqiao Feng, et al.
6

Despite the success of practical solvers in various NP-complete domains such as SAT and CSP as well as using deep reinforcement learning to tackle two-player games such as Go, certain classes of PSPACE-hard planning problems have remained out of reach. Even carefully designed domain-specialized solvers can fail quickly due to the exponential search space on hard instances. Recent works that combine traditional search methods, such as best-first search and Monte Carlo tree search, with Deep Neural Networks' (DNN) heuristics have shown promising progress and can solve a significant number of hard planning instances beyond specialized solvers. To better understand why these approaches work, we studied the interplay of the policy and value networks of DNN-based best-first search on Sokoban and show the surprising effectiveness of the policy network, further enhanced by the value network, as a guiding heuristic for the search. To further understand the phenomena, we studied the cost distribution of the search algorithms and found that Sokoban instances can have heavy-tailed runtime distributions, with tails both on the left and right-hand sides. In particular, for the first time, we show the existence of left heavy tails and propose an abstract tree model that can empirically explain the appearance of these tails. The experiments show the critical role of the policy network as a powerful heuristic guiding the search, which can lead to left heavy tails with polynomial scaling by avoiding exploring exponentially sized subtrees. Our results also demonstrate the importance of random restarts, as are widely used in traditional combinatorial solvers, for DNN-based search methods to avoid left and right heavy tails.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2020

Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning

Despite significant progress in general AI planning, certain domains rem...
research
09/20/2022

Graph Value Iteration

In recent years, deep Reinforcement Learning (RL) has been successful in...
research
06/17/2022

Efficient Object Manipulation Planning with Monte Carlo Tree Search

This paper presents an efficient approach to object manipulation plannin...
research
10/03/2021

A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances

In recent years, we have witnessed tremendous progress in deep reinforce...
research
05/18/2018

AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search

We present AlphaX, a fully automated agent that designs complex neural a...
research
09/26/2011

Where 'Ignoring Delete Lists' Works: Local Search Topology in Planning Benchmarks

Between 1998 and 2004, the planning community has seen vast progress in ...

Please sign up or login with your details

Forgot password? Click here to reset