Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

09/10/2015
by   David Balduzzi, et al.
0

This paper proposes GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation. The algorithm is based on two innovations. Firstly, we present a temporal-difference based method for learning the gradient of the value-function. Secondly, we present the deviator-actor-critic (DAC) model, which comprises three neural networks that estimate the value function, its gradient, and determine the actor's policy respectively. We evaluate GProp on two challenging tasks: a contextual bandit problem constructed from nonparametric regression datasets that is designed to probe the ability of reinforcement learning algorithms to accurately estimate gradients; and the octopus arm, a challenging reinforcement learning benchmark. GProp is competitive with fully supervised methods on the bandit task and achieves the best performance to date on the octopus arm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2021

Decoupling Value and Policy for Generalization in Reinforcement Learning

Standard deep reinforcement learning algorithms use a shared representat...
research
02/07/2021

Deep Reinforcement Learning with Dynamic Optimism

In recent years, deep off-policy actor-critic algorithms have become a d...
research
10/18/2019

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Prob...
research
05/17/2019

Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning

We assume that we are given a time series of data from a dynamical syste...
research
08/23/2022

An intelligent algorithmic trading based on a risk-return reinforcement learning algorithm

This scientific paper propose a novel portfolio optimization model using...
research
10/01/2019

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

In this paper, we aim to develop a simple and scalable reinforcement lea...
research
05/25/2020

Gradient Monitored Reinforcement Learning

This paper presents a novel neural network training approach for faster ...

Please sign up or login with your details

Forgot password? Click here to reset