Attention Loss Adjusted Prioritized Experience Replay

09/13/2023
by   Zhuoying Chen, et al.
0

Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2017

Experience Replay Using Transition Sequences

Experience replay is one of the most commonly used approaches to improve...
research
10/18/2018

Fast deep reinforcement learning using online adjustments from the past

We propose Ephemeral Value Adjusments (EVA): a means of allowing deep re...
research
07/12/2020

An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay

Prioritized Experience Replay (PER) is a deep reinforcement learning tec...
research
08/16/2020

An adaptive synchronization approach for weights of deep reinforcement learning

Deep Q-Networks (DQN) is one of the most well-known methods of deep rein...
research
03/24/2022

Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning

We present the extension of the Remember and Forget for Experience Repla...
research
08/22/2022

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Most reinforcement learning algorithms take advantage of an experience r...
research
12/07/2021

PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay

On-policy deep reinforcement learning algorithms have low data utilizati...

Please sign up or login with your details

Forgot password? Click here to reset