Generalized Off-Policy Actor-Critic

03/27/2019
by   Shangtong Zhang, et al.
28

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting. Compared to the commonly used excursion objective, which can be misleading about the performance of the target policy when deployed, our new objective better predicts such performance. We prove the Generalized Off-Policy Policy Gradient Theorem to compute the policy gradient of the counterfactual objective and use an emphatic approach to get an unbiased sample from this policy gradient, yielding the Generalized Off-Policy Actor-Critic (Geoff-PAC) algorithm. We demonstrate the merits of Geoff-PAC over existing algorithms in Mujoco robot simulation tasks, the first empirical success of emphatic algorithms in prevailing deep RL benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2021

Characterizing the Gap Between Actor-Critic and Policy Gradient

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Alth...
research
04/05/2019

Multi-Preference Actor Critic

Policy gradient algorithms typically combine discounted future rewards w...
research
04/08/2022

Multi-objective evolution for Generalizable Policy Gradient Algorithms

Performance, generalizability, and stability are three Reinforcement Lea...
research
03/13/2023

Reinforcement Learning-based Wavefront Sensorless Adaptive Optics Approaches for Satellite-to-Ground Laser Communication

Optical satellite-to-ground communication (OSGC) has the potential to im...
research
11/16/2021

Off-Policy Actor-Critic with Emphatic Weightings

A variety of theoretically-sound policy gradient algorithms exist for th...
research
12/10/2022

Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees

We revisit the domain of off-policy policy optimization in RL from the p...
research
12/25/2022

Novel Reinforcement Learning Algorithm for Suppressing Synchronization in Closed Loop Deep Brain Stimulators

Parkinson's disease is marked by altered and increased firing characteri...

Please sign up or login with your details

Forgot password? Click here to reset