Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER

12/02/2020
by   Markus Holzleitner, et al.
4

We prove under commonly used assumptions the convergence of actor-critic reinforcement learning algorithms, which simultaneously learn a policy function, the actor, and a value function, the critic. Both functions can be deep neural networks of arbitrary complexity. Our framework allows showing convergence of the well known Proximal Policy Optimization (PPO) and of the recently introduced RUDDER. For the convergence proof we employ recently introduced techniques from the two time-scale stochastic approximation theory. Our results are valid for actor-critic methods that use episodic samples and that have a policy that becomes more greedy during learning. Previous convergence proofs assume linear function approximation, cannot treat episodic examples, or do not consider that policies become greedy. The latter is relevant since optimal policies are typically deterministic.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2020

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

We study the global convergence and global optimality of actor-critic, o...
research
08/04/2023

Synthesizing Programmatic Policies with Actor-Critic Algorithms and ReLU Networks

Programmatically Interpretable Reinforcement Learning (PIRL) encodes pol...
research
09/16/2009

A Convergent Online Single Time Scale Actor Critic Algorithm

Actor-Critic based approaches were among the first to address reinforcem...
research
04/20/2023

Topological Guided Actor-Critic Modular Learning of Continuous Systems with Temporal Objectives

This work investigates the formal policy synthesis of continuous-state s...
research
03/22/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Policy optimization methods remain a powerful workhorse in empirical Rei...
research
07/15/2022

Deep Hedging: Continuous Reinforcement Learning for Hedging of General Portfolios across Multiple Risk Aversions

We present a method for finding optimal hedging policies for arbitrary i...
research
02/28/2022

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

We study the convergence of the actor-critic algorithm with nonlinear fu...

Please sign up or login with your details

Forgot password? Click here to reset