Direct Advantage Estimation

09/13/2021
by   Hsiao-Ru Pan, et al.
4

Credit assignment is one of the central problems in reinforcement learning. The predominant approach is to assign credit based on the expected return. However, we show that the expected return may depend on the policy in an undesirable way which could slow down learning. Instead, we borrow ideas from the causality literature and show that the advantage function can be interpreted as causal effects, which share similar properties with causal representations. Based on this insight, we propose the Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from data without requiring the (action-)value function. If desired, value functions can also be seamlessly integrated into DAE and be updated in a similar way to Temporal Difference Learning. The proposed method is easy to implement and can be readily adopted by modern actor-critic methods. We test DAE empirically on the Atari domain and show that it can achieve competitive results with the state-of-the-art method for advantage estimation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2018

Distributional Advantage Actor-Critic

In traditional reinforcement learning, an agent maximizes the reward col...
research
06/08/2021

Towards Practical Credit Assignment for Deep Reinforcement Learning

Credit assignment is a fundamental problem in reinforcement learning, th...
research
04/14/2021

Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning

Deep reinforcement learning methods have shown great performance on many...
research
01/25/2018

Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods

This paper investigates estimating the variance of a temporal-difference...
research
05/29/2023

VA-learning as a more efficient alternative to Q-learning

In reinforcement learning, the advantage function is critical for policy...
research
04/15/2019

Self-critical n-step Training for Image Captioning

Existing methods for image captioning are usually trained by cross entro...
research
07/09/2018

Partial Policy-based Reinforcement Learning for Anatomical Landmark Localization in 3D Medical Images

Deploying the idea of long-term cumulative return, reinforcement learnin...

Please sign up or login with your details

Forgot password? Click here to reset