Sample-based Distributional Policy Gradient

01/08/2020
by   Rahul Singh, et al.
0

Distributional reinforcement learning (DRL) is a recent reinforcement learning framework whose success has been supported by various empirical studies. It relies on the key idea of replacing the expected return with the return distribution, which captures the intrinsic randomness of the long term rewards. Most of the existing literature on DRL focuses on problems with discrete action space and value based methods. In this work, motivated by applications in robotics with continuous action space control settings, we propose sample-based distributional policy gradient (SDPG) algorithm. It models the return distribution using samples via a reparameterization technique widely used in generative modeling and inference. We compare SDPG with the state-of-art policy gradient method in DRL, distributed distributional deterministic policy gradients (D4PG), which has demonstrated state-of-art performance. We apply SDPG and D4PG to multiple OpenAI Gym environments and observe that our algorithm shows better sample efficiency as well as higher reward for most tasks.

READ FULL TEXT
research
03/23/2023

Policy Evaluation in Distributional LQR

Distributional reinforcement learning (DRL) enhances the understanding o...
research
01/12/2022

Evolutionary Action Selection for Gradient-based Policy Learning

Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have...
research
04/23/2018

Distributed Distributional Deterministic Policy Gradients

This work adopts the very successful distributional perspective on reinf...
research
08/19/2022

A Risk-Sensitive Approach to Policy Optimization

Standard deep reinforcement learning (DRL) aims to maximize expected rew...
research
08/28/2022

Normality-Guided Distributional Reinforcement Learning for Continuous Control

Learning a predictive model of the mean return, or value function, plays...
research
07/13/2020

Implicit Distributional Reinforcement Learning

To improve the sample efficiency of policy-gradient based reinforcement ...
research
06/25/2020

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

In recent years, Deep Reinforcement Learning (DRL) algorithms have achie...

Please sign up or login with your details

Forgot password? Click here to reset