Replay For Safety

12/08/2021
by   Liran Szlak, et al.
0

Experience replay <cit.> is a widely used technique to achieve efficient use of data and improved performance in RL algorithms. In experience replay, past transitions are stored in a memory buffer and re-used during learning. Various suggestions for sampling schemes from the replay buffer have been suggested in previous works, attempting to optimally choose those experiences which will most contribute to the convergence to an optimal policy. Here, we give some conditions on the replay sampling scheme that will ensure convergence, focusing on the well-known Q-learning algorithm in the tabular setting. After establishing sufficient conditions for convergence, we turn to suggest a slightly different usage for experience replay - replaying memories in a biased manner as a means to change the properties of the resulting policy. We initiate a rigorous study of experience replay as a tool to control and modify the properties of the resulting policy. In particular, we show that using an appropriate biased sampling scheme can allow us to achieve a safe policy. We believe that using experience replay as a biasing mechanism that allows controlling the resulting policy in desirable ways is an idea with promising potential for many applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2021

Large Batch Experience Replay

Several algorithms have been proposed to sample non-uniformly the replay...
research
12/08/2021

Convergence Results For Q-Learning With Experience Replay

A commonly used heuristic in RL is experience replay (e.g. <cit.>), in w...
research
07/14/2020

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Experience replay, which enables the agents to remember and reuse experi...
research
02/21/2023

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Experience replay is crucial for off-policy reinforcement learning (RL) ...
research
07/27/2022

Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms

Learning in high dimensional continuous tasks is challenging, mainly whe...
research
07/15/2023

An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets

Reinforcement Learning (RL) algorithms aim to learn an optimal policy by...
research
02/09/2021

Reverb: A Framework For Experience Replay

A central component of training in Reinforcement Learning (RL) is Experi...

Please sign up or login with your details

Forgot password? Click here to reset