Actor Prioritized Experience Replay

09/01/2022
by   Baturay Saglam, et al.
0

A widely-studied deep reinforcement learning (RL) technique known as Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. Although it has been shown that PER is one of the most crucial components for the overall performance of deep RL methods in discrete action domains, many empirical studies indicate that it considerably underperforms actor-critic algorithms in continuous control. We theoretically show that actor networks cannot be effectively trained with transitions that have large TD errors. As a result, the approximate policy gradient computed under the Q-network diverges from the actual gradient computed under the optimal Q-function. Motivated by this, we introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER. The introduced algorithm suggests a new branch of improvements to PER and schedules effective and efficient training for both actor and critic networks. An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches and obtains state-of-the-art results over the standard off-policy actor-critic algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

AACHER: Assorted Actor-Critic Deep Reinforcement Learning with Hindsight Experience Replay

Actor learning and critic learning are two components of the outstanding...
research
09/25/2019

Off-Policy Actor-Critic with Shared Experience Replay

We investigate the combination of actor-critic reinforcement learning al...
research
01/15/2020

Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPO

In this paper, a novel racing environment for OpenAI Gym is introduced. ...
research
10/28/2021

Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

This paper proposes a method for prioritizing the replay experience refe...
research
05/05/2020

Discrete-to-Deep Supervised Policy Learning

Neural networks are effective function approximators, but hard to train ...
research
07/01/2017

Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Deep reinforcement learning (RL) methods have significant potential for ...
research
08/01/2022

Off-Policy Correction for Actor-Critic Algorithms in Deep Reinforcement Learning

Compared to on-policy policy gradient techniques, off-policy model-free ...

Please sign up or login with your details

Forgot password? Click here to reset