Trajectory-Based Off-Policy Deep Reinforcement Learning

05/14/2019
by   Andreas Doerr, et al.
5

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2013

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement le...
research
08/08/2019

Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

Policy gradient methods have demonstrated success in reinforcement learn...
research
05/20/2022

Sigmoidally Preconditioned Off-policy Learning:a new exploration method for reinforcement learning

One of the major difficulties of reinforcement learning is learning from...
research
03/19/2020

Exchangeable Input Representations for Reinforcement Learning

Poor sample efficiency is a major limitation of deep reinforcement learn...
research
05/17/2022

Adaptive Momentum-Based Policy Gradient with Second-Order Information

The variance reduced gradient estimators for policy gradient methods has...
research
05/11/2023

Policy Gradient Algorithms Implicitly Optimize by Continuation

Direct policy optimization in reinforcement learning is usually solved w...
research
09/17/2018

Policy Optimization via Importance Sampling

Policy optimization is an effective reinforcement learning approach to s...

Please sign up or login with your details

Forgot password? Click here to reset