PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay

12/07/2021
by   Xingxing Liang, et al.
11

On-policy deep reinforcement learning algorithms have low data utilization and require significant experience for policy improvement. This paper proposes a proximal policy optimization algorithm with prioritized trajectory replay (PTR-PPO) that combines on-policy and off-policy methods to improve sampling efficiency by prioritizing the replay of trajectories generated by old policies. We first design three trajectory priorities based on the characteristics of trajectories: the first two being max and mean trajectory priorities based on one-step empirical generalized advantage estimation (GAE) values and the last being reward trajectory priorities based on normalized undiscounted cumulative reward. Then, we incorporate the prioritized trajectory replay into the PPO algorithm, propose a truncated importance weight method to overcome the high variance caused by large importance weights under multistep experience, and design a policy improvement loss function for PPO under off-policy conditions. We evaluate the performance of PTR-PPO in a set of Atari discrete control tasks, achieving state-of-the-art performance. In addition, by analyzing the heatmap of priority changes at various locations in the priority memory during training, we find that memory size and rollout length can have a significant impact on the distribution of trajectory priorities and, hence, on the performance of the algorithm.

READ FULL TEXT

page 2

page 9

page 12

page 13

research
06/19/2019

Experience Replay Optimization

Experience replay enables reinforcement learning agents to memorize and ...
research
08/25/2022

Variance Reduction based Experience Replay for Policy Optimization

For reinforcement learning on complex stochastic systems where many fact...
research
07/16/2022

Associative Memory Based Experience Replay for Deep Reinforcement Learning

Experience replay is an essential component in deep reinforcement learni...
research
06/27/2023

Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning

In recent years, data-driven reinforcement learning (RL), also known as ...
research
09/13/2023

Attention Loss Adjusted Prioritized Experience Replay

Prioritized Experience Replay (PER) is a technical means of deep reinfor...
research
05/11/2023

Towards Understanding and Improving GFlowNet Training

Generative flow networks (GFlowNets) are a family of algorithms that lea...
research
08/10/2023

Proximal Policy Optimization Actual Combat: Manipulating Output Tokenizer Length

The Reinforcement Learning from Human Feedback (RLHF) plays a pivotal ro...

Please sign up or login with your details

Forgot password? Click here to reset