Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

12/11/2019
by   Riashat Islam, et al.
0

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new policy after every policy gradient update. Despite enormous success of off-policy policy gradients on control tasks, existing general methods suffer from high variance and instability, partly because the policy improvement depends on gradient of the estimated value function. In this work, we present a new way of off-policy policy evaluation in actor-critic, based on the doubly robust estimators. We extend the doubly robust estimator from off-policy policy evaluation (OPE) to actor-critic algorithms that consist of a reward estimator performance model. We find that doubly robust estimation of the critic can significantly improve performance in continuous control tasks. Furthermore, in cases where the reward function is stochastic that can lead to high variance, doubly robust critic estimation can improve performance under corrupted, stochastic reward signals, indicating its usefulness for robust and safe reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2018

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with ...
research
06/12/2020

Potential Field Guided Actor-Critic Reinforcement Learning

In this paper, we consider the problem of actor-critic reinforcement lea...
research
05/09/2018

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

In reinforcement learning (RL), stochastic environments can make learnin...
research
09/04/2020

Visualizing the Loss Landscape of Actor Critic Methods with Applications in Inventory Optimization

Continuous control is a widely applicable area of reinforcement learning...
research
06/02/2023

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

In this paper, we introduce a novel method for enhancing the effectivene...
research
06/20/2022

DNA: Proximal Policy Optimization with a Dual Network Architecture

This paper explores the problem of simultaneously learning a value funct...
research
04/05/2020

Reinforcement Learning Architectures: SAC, TAC, and ESAC

The trend is to implement intelligent agents capable of analyzing availa...

Please sign up or login with your details

Forgot password? Click here to reset