Levin Tree Search with Context Models

05/26/2023
by   Laurent Orseau, et al.
0

Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy. This guarantee can be used as a loss function, which we call the LTS loss, to optimize neural networks representing the policy (LTS+NN). In this work we show that the neural network can be substituted with parameterized context models originating from the online compression literature (LTS+CM). We show that the LTS loss is convex under this new model, which allows for using standard convex optimization tools, and obtain convergence guarantees to the optimal parameters in an online setting for a given set of solution trajectories – guarantees that cannot be provided for neural networks. The new LTS+CM algorithm compares favorably against LTS+NN on several benchmarks: Sokoban (Boxoban), The Witness, and the 24-Sliding Tile puzzle (STP). The difference is particularly large on STP, where LTS+NN fails to solve most of the test instances while LTS+CM solves each test instance in a fraction of a second. Furthermore, we show that LTS+CM is able to learn a policy that solves the Rubik's cube in only a few hundred expansions, which considerably improves upon previous machine learning techniques.

READ FULL TEXT
research
03/21/2021

Policy-Guided Heuristic Search with Guarantees

The use of a policy and a heuristic function for guiding search can be q...
research
11/27/2018

Single-Agent Policy Tree Search With Guarantees

We introduce two novel tree search algorithms that use a policy to guide...
research
03/31/2022

A unified theory of learning

Recently machine learning using neural networks (NN) has been developed,...
research
08/16/2023

Safety Filter Design for Neural Network Systems via Convex Optimization

With the increase in data availability, it has been widely demonstrated ...
research
04/21/2021

Exploiting Learned Policies in Focal Search

Recent machine-learning approaches to deterministic search and domain-in...
research
02/08/2015

Learning to Search Better Than Your Teacher

Methods for learning to search for structured prediction typically imita...
research
09/12/2022

A Differentiable Loss Function for Learning Heuristics in A*

Optimization of heuristic functions for the A* algorithm, realized by de...

Please sign up or login with your details

Forgot password? Click here to reset