Sample Efficient Reinforcement Learning with REINFORCE

10/22/2020
by   Junzi Zhang, et al.
15

Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory. However, prior works have either required exact gradients or state-action visitation measure based mini-batch stochastic gradients with a diverging batch size, which limit their applicability in practical scenarios. In this paper, we consider classical policy gradient methods that compute an approximate gradient with a single trajectory or a fixed size mini-batch of trajectories, along with the widely-used REINFORCE gradient estimation procedure. By controlling the number of "bad" episodes and resorting to the classical doubling trick, we establish an anytime sub-linear high probability regret bound as well as almost sure global convergence of the average regret with an asymptotically sub-linear rate. These provide the first set of global convergence and sample efficiency results for the well-known REINFORCE algorithm and contribute to a better understanding of its performance in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2021

On the Global Convergence of Momentum-based Policy Gradient

Policy gradient (PG) methods are popular and efficient for large-scale r...
research
05/29/2019

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

We revisit the stochastic variance-reduced policy gradient (SVRPG) metho...
research
10/19/2021

Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization

Entropy regularization is an efficient technique for encouraging explora...
research
11/06/2017

AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods

We study a new aggregation operator for gradients coming from a mini-bat...
research
07/21/2020

A Note on the Linear Convergence of Policy Gradient Methods

We revisit the finite time analysis of policy gradient methods in the si...
research
04/13/2023

Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator

We study here a fixed mini-batch gradient decent (FMGD) algorithm to sol...
research
12/04/2022

Convergence under Lipschitz smoothness of ease-controlled Random Reshuffling gradient Algorithms

We consider minimizing the average of a very large number of smooth and ...

Please sign up or login with your details

Forgot password? Click here to reset