Off-Policy Actor-Critic with Shared Experience Replay

09/25/2019
by   Simon Schmitt, et al.
0

We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability of very off-policy learning. We employ those insights to accelerate hyper-parameter sweeps in which all participating agents run concurrently and share their experience via a common replay module. To this end we analyze the bias-variance tradeoffs in V-trace, a form of importance sampling for actor-critic methods. Based on our analysis, we then argue for mixing experience sampled from replay with on-policy experience, and propose a new trust region scheme that scales effectively to data distributions where V-trace becomes unstable. We provide extensive empirical validation of the proposed solution. We further show the benefits of this setup by demonstrating state-of-the-art data efficiency on Atari among agents trained up until 200M environment frames.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2022

Actor Prioritized Experience Replay

A widely-studied deep reinforcement learning (RL) technique known as Pri...
research
07/01/2017

Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Deep reinforcement learning (RL) methods have significant potential for ...
research
10/06/2021

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

Off-policy Actor-Critic algorithms have demonstrated phenomenal experime...
research
03/07/2019

MinAtar: An Atari-inspired Testbed for More Efficient Reinforcement Learning Experiments

The Arcade Learning Environment (ALE) is a popular platform for evaluati...
research
07/05/2019

Dependency-aware Attention Control for Unconstrained Face Recognition with Image Sets

This paper targets the problem of image set-based face verification and ...
research
07/16/2018

Remember and Forget for Experience Replay

Experience replay (ER) is crucial for attaining high data-efficiency in ...
research
01/15/2020

Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPO

In this paper, a novel racing environment for OpenAI Gym is introduced. ...

Please sign up or login with your details

Forgot password? Click here to reset