Stochastic Convergence Results for Regularized Actor-Critic Methods

07/13/2019
by   Wesley Suttle, et al.
0

In this paper, we present a stochastic convergence proof, under suitable conditions, of a certain class of actor-critic algorithms for finding approximate solutions to entropy-regularized MDPs using the machinery of stochastic approximation. To obtain this overall result, we provide three fundamental results that are all of both practical and theoretical interest: we prove the convergence of policy evaluation with general regularizers when using linear approximation architectures, we derive an entropy-regularized policy gradient theorem, and we show convergence of entropy-regularized policy improvement. We also provide a simple, illustrative empirical study corroborating our theoretical results. To the best of our knowledge, this is the first time such results have been provided for approximate solution methods for regularized MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

We propose a single time-scale actor-critic algorithm to solve the linea...
research
01/28/2022

Do You Need the Entropy Reward (in Practice)?

Maximum entropy (MaxEnt) RL maximizes a combination of the original task...
research
08/29/2019

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence

Policy gradient methods with actor-critic schemes demonstrate tremendous...
research
06/02/2022

Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

Natural actor-critic (NAC) and its variants, equipped with the represent...
research
06/09/2020

AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

Entropy is ubiquitous in machine learning, but it is in general intracta...
research
09/20/2019

On the Convergence of Approximate and Regularized Policy Iteration Schemes

Algorithms based on the entropy regularized framework, such as Soft Q-le...
research
02/15/2022

Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms

In Reinforcement Learning, the optimal action at a given state is depend...

Please sign up or login with your details

Forgot password? Click here to reset