Optimized Look-Ahead Tree Policies: A Bridge Between Look-Ahead Tree Policies and Direct Policy Search

08/23/2012
by   Tobias Jung, et al.
0

Direct policy search (DPS) and look-ahead tree (LT) policies are two widely used classes of techniques to produce high performance policies for sequential decision-making problems. To make DPS approaches work well, one crucial issue is to select an appropriate space of parameterized policies with respect to the targeted problem. A fundamental issue in LT approaches is that, to take good decisions, such policies must develop very large look-ahead trees which may require excessive online computational resources. In this paper, we propose a new hybrid policy learning scheme that lies at the intersection of DPS and LT, in which the policy is an algorithm that develops a small look-ahead tree in a directed way, guided by a node scoring function that is learned through DPS. The LT-based representation is shown to be a versatile way of representing policies in a DPS scheme, while at the same time, DPS enables to significantly reduce the size of the look-ahead trees that are required to take high-quality decisions. We experimentally compare our method with two other state-of-the-art DPS techniques and four common LT policies on four benchmark domains and show that it combines the advantages of the two techniques from which it originates. In particular, we show that our method: (1) produces overall better performing policies than both pure DPS and pure LT policies, (2) requires a substantially smaller number of policy evaluations than other DPS techniques, (3) is easy to tune and (4) results in policies that are quite robust with respect to perturbations of the initial conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2017

On Using Linear Diophantine Equations to Tune the extent of Look Ahead while Hiding Decision Tree Rules

This paper focuses on preserving the privacy of sensitive pat-terns when...
research
06/01/2011

Conflict-Directed Backjumping Revisited

In recent years, many improvements to backtracking algorithms for solvin...
research
02/04/2022

Learning Interpretable, High-Performing Policies for Continuous Control Problems

Gradient-based approaches in reinforcement learning (RL) have achieved t...
research
07/04/2020

Playing Chess with Limited Look Ahead

We have seen numerous machine learning methods tackle the game of chess ...
research
02/12/2020

Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies

Branch and Bound (B B) is the exact tree search method typically used ...
research
10/01/2019

The Choice Function Framework for Online Policy Improvement

There are notable examples of online search improving over hand-coded or...
research
01/14/2018

Shai: Enforcing Data-Specific Policies with Near-Zero Runtime Overhead

Data retrieval systems such as online search engines and online social n...

Please sign up or login with your details

Forgot password? Click here to reset