Investigation on the generalization of the Sampled Policy Gradient algorithm

10/09/2019
by   Nil Stolt Ansó, et al.
0

The Sampled Policy Gradient (SPG) algorithm is a new offline actor-critic variant that samples in the action space to approximate the policy gradient. It does so by using the critic to evaluate the sampled actions. SPG offers theoretical promise over similar algorithms such as DPG as it searches the action-Q-value space independently of the local gradient, enabling it to avoid local minima. This paper aims to compare SPG to two similar actor-critic algorithms, CACLA and DPG. The comparison is made across two different environments, two different network architectures, as well as training on on-policy transitions in contrast to using an experience buffer. Results seem to show that although SPG does often not perform the worst, it doesn't always match the performance of the best performing algorithm at a particular task. Further experiments are required to get a better estimate of the qualities of SPG.

READ FULL TEXT
research
09/15/2018

Sampled Policy Gradient for Learning to Play the Game Agar.io

In this paper, a new offline actor-critic learning algorithm is introduc...
research
11/05/2016

Combining policy gradient and Q-learning

Policy gradient is an efficient technique for improving a policy in a re...
research
01/15/2020

Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPO

In this paper, a novel racing environment for OpenAI Gym is introduced. ...
research
02/26/2020

When Do Drivers Concentrate? Attention-based Driver Behavior Modeling With Deep Reinforcement Learning

Driver distraction a significant risk to driving safety. Apart from spat...
research
05/18/2018

Learning Permutations with Sinkhorn Policy Gradient

Many problems at the intersection of combinatorics and computer science ...
research
03/02/2021

Offline Reinforcement Learning with Pseudometric Learning

Offline Reinforcement Learning methods seek to learn a policy from logge...
research
04/21/2021

Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching

Image-text matching is an important multi-modal task with massive applic...

Please sign up or login with your details

Forgot password? Click here to reset