Provably Convergent Off-Policy Actor-Critic with Function Approximation

11/11/2019
by   Shangtong Zhang, et al.
26

We present the first provably convergent off-policy actor-critic algorithm (COF-PAC) with function approximation in a two-timescale form. Key to COF-PAC is the introduction of a new critic, the emphasis critic, which is trained via Gradient Emphasis Learning (GEM), a novel combination of the key ideas of Gradient Temporal Difference Learning and Emphatic Temporal Difference Learning. With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2020

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a va...
research
10/10/2022

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

We revisit the standard formulation of tabular actor-critic algorithm as...
research
01/30/2023

PAC-Bayesian Soft Actor-Critic Learning

Actor-critic algorithms address the dual goals of reinforcement learning...
research
02/28/2022

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

We study the convergence of the actor-critic algorithm with nonlinear fu...
research
07/14/2017

Freeway Merging in Congested Traffic based on Multipolicy Decision Making with Passive Actor Critic

Freeway merging in congested traffic is a significant challenge toward f...
research
02/23/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Designing off-policy reinforcement learning algorithms is typically a ve...
research
04/29/2019

DAC: The Double Actor-Critic Architecture for Learning Options

We reformulate the option framework as two parallel augmented MDPs. Unde...

Please sign up or login with your details

Forgot password? Click here to reset