Gradient-based Reinforcement Planning in Policy-Search Methods

11/28/2001
by   Ivo Kwee, et al.
0

We introduce a learning method called "gradient-based reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.

READ FULL TEXT
research
02/03/2022

ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search

A tree-based online search algorithm iteratively simulates trajectories ...
research
12/13/2015

Policy Gradient Methods for Off-policy Control

Off-policy learning refers to the problem of learning the value function...
research
12/23/2019

Monte-Carlo Tree Search for Policy Optimization

Gradient-based methods are often used for policy optimization in deep re...
research
03/05/2018

Recurrent Predictive State Policy Networks

We introduce Recurrent Predictive State Policy (RPSP) networks, a recurr...
research
01/30/2023

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

Despite the popularity of policy gradient methods, they are known to suf...
research
06/10/2022

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

We study policy optimization for Markov decision processes (MDPs) with m...
research
10/17/2020

Approximate information state for approximate planning and reinforcement learning in partially observed systems

We propose a theoretical framework for approximate planning and learning...

Please sign up or login with your details

Forgot password? Click here to reset