ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search

02/03/2022
by   Dixant Mittal, et al.
0

A tree-based online search algorithm iteratively simulates trajectories and updates Q-value information on a set of states represented by a tree structure. Alternatively, policy gradient based online search algorithms update the information obtained from simulated trajectories directly onto the parameters of the policy and has been found to be effective. While tree-based methods limit the updates from simulations to the states that exist in the tree and do not interpolate the information to nearby states, policy gradient search methods do not do explicit exploration. In this paper, we show that it is possible to combine and leverage the strengths of these two methods for improved search performance. We examine the key reasons behind the improvement and propose a simple yet effective online search method, named Exploratory Policy Gradient Search (ExPoSe), that updates both the parameters of the policy as well as search information on the states in the trajectory. We conduct experiments on complex planning problems, which include Sokoban and Hamiltonian cycle search in sparse graphs and show that combining exploration with policy gradient improves online search performance.

READ FULL TEXT

page 3

page 7

research
04/07/2019

Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

Monte Carlo Tree Search (MCTS) algorithms perform simulation-based searc...
research
11/28/2001

Gradient-based Reinforcement Planning in Policy-Search Methods

We introduce a learning method called "gradient-based reinforcement plan...
research
02/10/2019

Diverse Exploration via Conjugate Policies for Policy Gradient Methods

We address the challenge of effective exploration while maintaining good...
research
09/28/2022

SoftTreeMax: Policy Gradient with Tree Search

Policy-gradient methods are widely used for learning control policies. T...
research
10/29/2020

Low-Variance Policy Gradient Estimation with World Models

In this paper, we propose World Model Policy Gradient (WMPG), an approac...
research
06/29/2021

Curious Explorer: a provable exploration strategy in Policy Learning

Having access to an exploring restart distribution (the so-called wide c...
research
02/24/2021

Combining Off and On-Policy Training in Model-Based Reinforcement Learning

The combination of deep learning and Monte Carlo Tree Search (MCTS) has ...

Please sign up or login with your details

Forgot password? Click here to reset