Revisiting Prioritized Experience Replay: A Value Perspective

02/05/2021
by   Ang A. Li, et al.
0

Experience replay enables off-policy reinforcement learning (RL) agents to utilize past experiences to maximize the cumulative reward. Prioritized experience replay that weighs experiences by the magnitude of their temporal-difference error (|TD|) significantly improves the learning efficiency. But how |TD| is related to the importance of experience is not well understood. We address this problem from an economic perspective, by linking |TD| to value of experience, which is defined as the value added to the cumulative reward by accessing the experience. We theoretically show the value metrics of experience are upper-bounded by |TD| for Q-learning. Furthermore, we successfully extend our theoretical framework to maximum-entropy RL by deriving the lower and upper bounds of these value metrics for soft Q-learning, which turn out to be the product of |TD| and "on-policyness" of the experiences. Our framework links two important quantities in RL: |TD| and value of experience. We empirically show that the bounds hold in practice, and experience replay using the upper bound as priority improves maximum-entropy RL in Atari games.

READ FULL TEXT

page 6

page 7

page 8

research
06/19/2019

Experience Replay Optimization

Experience replay enables reinforcement learning agents to memorize and ...
research
07/14/2020

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Experience replay, which enables the agents to remember and reuse experi...
research
01/26/2023

Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout

In reinforcement learning (RL) with experience replay, experiences store...
research
03/12/2023

Synthetic Experience Replay

A key theme in the past decade has been that when large neural networks ...
research
05/31/2023

AccMER: Accelerating Multi-Agent Experience Replay with Cache Locality-aware Prioritization

Multi-Agent Experience Replay (MER) is a key component of off-policy rei...
research
03/03/2023

Eventual Discounting Temporal Logic Counterfactual Experience Replay

Linear temporal logic (LTL) offers a simplified way of specifying tasks ...
research
02/22/2023

Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging

Selective experience replay is a popular strategy for integrating lifelo...

Please sign up or login with your details

Forgot password? Click here to reset