Adaptive Experience Selection for Policy Gradient

02/17/2020
by   Saad Mohamad, et al.
4

Policy gradient reinforcement learning (RL) algorithms have achieved impressive performance in challenging learning tasks such as continuous control, but suffer from high sample complexity. Experience replay is a commonly used approach to improve sample efficiency, but gradient estimators using past trajectories typically have high variance. Existing sampling strategies for experience replay like uniform sampling or prioritised experience replay do not explicitly try to control the variance of the gradient estimates. In this paper, we propose an online learning algorithm, adaptive experience selection (AES), to adaptively learn an experience sampling distribution that explicitly minimises this variance. Using a regret minimisation approach, AES iteratively updates the experience sampling distribution to match the performance of a competitor distribution assumed to have optimal variance. Sample non-stationarity is addressed by proposing a dynamic (i.e. time changing) competitor distribution for which a closed-form solution is proposed. We demonstrate that AES is a low-regret algorithm with reasonable sample complexity. Empirically, AES has been implemented for deep deterministic policy gradient and soft actor critic algorithms, and tested on 8 continuous control tasks from the OpenAI Gym library. Ours results show that AES leads to significantly improved performance compared to currently available experience sampling strategies for policy gradient.

READ FULL TEXT

page 1

page 8

page 9

research
11/12/2021

AWD3: Dynamic Reduction of the Estimation Bias

Value-based deep Reinforcement Learning (RL) algorithms suffer from the ...
research
06/23/2020

Experience Replay with Likelihood-free Importance Weights

The use of past experiences to accelerate temporal difference (TD) learn...
research
06/26/2020

DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

This paper prescribes a suite of techniques for off-policy Reinforcement...
research
04/08/2019

Samples are not all useful: Denoising policy gradient updates using variance

Policy gradient algorithms in reinforcement learning rely on efficiently...
research
08/25/2022

Variance Reduction based Experience Replay for Policy Optimization

For reinforcement learning on complex stochastic systems where many fact...
research
03/03/2019

Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control in Computationally Complex Environments

Deep Deterministic Policy Gradient (DDPG) has been proved to be a succes...
research
09/28/2022

SoftTreeMax: Policy Gradient with Tree Search

Policy-gradient methods are widely used for learning control policies. T...

Please sign up or login with your details

Forgot password? Click here to reset