A Study of Policy Gradient on a Class of Exactly Solvable Models

11/03/2020
by   Gavin McCracken, et al.
20

Policy gradient methods are extensively used in reinforcement learning as a way to optimize expected return. In this paper, we explore the evolution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain, whose transition probabilities are determined by the gradient of the distribution of the policy's value. Our approach relies heavily on random walk theory, specifically on affine Weyl groups. We construct a class of novel partially observable environments with controllable exploration difficulty, in which the value distribution, and hence the policy parameter evolution, can be derived analytically. Using these environments, we analyze the probabilistic convergence of policy gradient to different local maxima of the value function. To our knowledge, this is the first approach developed to analytically compute the landscape of policy gradient in POMDPs for a class of such environments, leading to interesting insights into the difficulty of this problem.

READ FULL TEXT

page 2

page 7

research
10/18/2018

Trust Region Policy Optimization of POMDPs

We propose Generalized Trust Region Policy Optimization (GTRPO), a Reinf...
research
12/13/2015

Policy Gradient Methods for Off-policy Control

Off-policy learning refers to the problem of learning the value function...
research
11/12/2019

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given cl...
research
06/14/2022

How are policy gradient methods affected by the limits of control?

We study stochastic policy gradient methods from the perspective of cont...
research
04/09/2020

Policy Gradient using Weak Derivatives for Reinforcement Learning

This paper considers policy search in continuous state-action reinforcem...
research
07/03/2023

Monte Carlo Policy Gradient Method for Binary Optimization

Binary optimization has a wide range of applications in combinatorial op...
research
05/27/2018

Contextual Policy Optimisation

Policy gradient methods have been successfully applied to a variety of r...

Please sign up or login with your details

Forgot password? Click here to reset