Distributional Advantage Actor-Critic

06/10/2018
by   Shangda Li, et al.
0

In traditional reinforcement learning, an agent maximizes the reward collected during its interaction with the environment by approximating the optimal policy through the estimation of value functions. Typically, given a state s and action a, the corresponding value is the expected discounted sum of rewards. The optimal action is then chosen to be the action a with the largest value estimated by value function. However, recent developments have shown both theoretical and experimental evidence of superior performance when value function is replaced with value distribution in context of deep Q learning [1]. In this paper, we develop a new algorithm that combines advantage actor-critic with value distribution estimated by quantile regression. We evaluated this new algorithm, termed Distributional Advantage Actor-Critic (DA2C or QR-A2C) on a variety of tasks, and observed it to achieve at least as good as baseline algorithms, and outperforming baseline in some tasks with smaller variance and increased stability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2020

Is Standard Deviation the New Standard? Revisiting the Critic in Deep Policy Gradients

Policy gradient algorithms have proven to be successful in diverse decis...
research
09/13/2021

Direct Advantage Estimation

Credit assignment is one of the central problems in reinforcement learni...
research
05/24/2021

GMAC: A Distributional Perspective on Actor-Critic Framework

In this paper, we devise a distributional framework on actor-critic as a...
research
05/09/2018

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

In reinforcement learning (RL), stochastic environments can make learnin...
research
10/08/2019

Deep Value Model Predictive Control

In this paper, we introduce an actor-critic algorithm called Deep Value ...
research
06/02/2023

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

In this paper, we introduce a novel method for enhancing the effectivene...
research
02/01/2023

Distillation Policy Optimization

On-policy algorithms are supposed to be stable, however, sample-intensiv...

Please sign up or login with your details

Forgot password? Click here to reset