Policy Gradient Algorithms Implicitly Optimize by Continuation

05/11/2023
by   Adrien Bolland, et al.
7

Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification of these algorithms. First, we formulate direct policy optimization in the optimization by continuation framework. The latter is a framework for optimizing nonconvex functions where a sequence of surrogate objective functions, called continuations, are locally optimized. Second, we show that optimizing affine Gaussian policies and performing entropy regularization can be interpreted as implicitly optimizing deterministic policies by continuation. Based on these theoretical results, we argue that exploration in policy-gradient algorithms consists in computing a continuation of the return of the policy at hand, and that the variance of policies should be history-dependent functions adapted to avoid local extrema rather than to maximize the return of the policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2019

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given cl...
research
06/14/2019

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Direct optimization is an appealing approach to differentiating through ...
research
06/02/2020

Learning optimal environments using projected stochastic gradient ascent

In this work, we generalize the direct policy search algorithms to an al...
research
06/17/2019

Is the Policy Gradient a Gradient?

The policy gradient theorem describes the gradient of the expected disco...
research
05/14/2019

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms a...
research
12/04/2019

AlgaeDICE: Policy Gradient from Arbitrary Experience

In many real-world applications of reinforcement learning (RL), interact...
research
06/19/2019

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

Policy gradient (PG) methods are a widely used reinforcement learning me...

Please sign up or login with your details

Forgot password? Click here to reset