Implicit Distributional Reinforcement Learning

07/13/2020
by   Yuguang Yue, et al.
23

To improve the sample efficiency of policy-gradient based reinforcement learning algorithms, we propose implicit distributional actor critic (IDAC) that consists of a distributional critic, built on two deep generator networks (DGNs), and a semi-implicit actor (SIA), powered by a flexible policy distribution. We adopt a distributional perspective on the discounted cumulative return and model it with a state-action-dependent implicit distribution, which is approximated by the DGNs that take state-action pairs and random noises as their input. Moreover, we use the SIA to provide a semi-implicit policy distribution, which mixes the policy parameters with a reparameterizable distribution that is not constrained by an analytic density function. In this way, the policy's marginal distribution is implicit, providing the potential to model complex properties such as covariance structure and skewness, but its parameter and entropy can still be estimated. We incorporate these features with an off-policy algorithm framework to solve problems with continuous action space, and compare IDAC with the state-of-art algorithms on representative OpenAI Gym environments. We observe that IDAC outperforms these baselines for most tasks.

READ FULL TEXT
research
01/09/2020

Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function

In current reinforcement learning (RL) methods, function approximation e...
research
05/23/2019

Distributional Policy Optimization: An Alternative Approach for Continuous Control

We identify a fundamental problem in policy gradient-based methods in co...
research
01/08/2020

Sample-based Distributional Policy Gradient

Distributional reinforcement learning (DRL) is a recent reinforcement le...
research
06/11/2023

PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm

In this paper, we propose the first fully push-forward-based Distributio...
research
05/24/2021

GMAC: A Distributional Perspective on Actor-Critic Framework

In this paper, we devise a distributional framework on actor-critic as a...
research
04/15/2019

A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms

Consistently checking the statistical significance of experimental resul...
research
01/30/2023

Planning Multiple Epidemic Interventions with Reinforcement Learning

Combating an epidemic entails finding a plan that describes when and how...

Please sign up or login with your details

Forgot password? Click here to reset