Learning Guidance Rewards with Trajectory-space Smoothing

10/23/2020
by   Tanmay Gangwani, et al.
7

Long-term temporal credit assignment is an important challenge in deep reinforcement learning (RL). It refers to the ability of the agent to attribute actions to consequences that may occur after a long time interval. Existing policy-gradient and Q-learning algorithms typically rely on dense environmental rewards that provide rich short-term supervision and help with credit assignment. However, they struggle to solve tasks with delays between an action and the corresponding rewarding feedback. To make credit assignment easier, recent works have proposed algorithms to learn dense "guidance" rewards that could be used in place of the sparse or delayed environmental rewards. This paper is in the same vein – starting with a surrogate RL objective that involves smoothing in the trajectory-space, we arrive at a new algorithm for learning guidance rewards. We show that the guidance rewards have an intuitive interpretation, and can be obtained without training any additional neural networks. Due to the ease of integration, we use the guidance rewards in a few popular algorithms (Q-learning, Actor-Critic, Distributional-RL) and present results in single-agent and multi-agent tasks that elucidate the benefit of our approach when the environmental rewards are sparse or delayed.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/08/2021

Towards Practical Credit Assignment for Deep Reinforcement Learning

Credit assignment is a fundamental problem in reinforcement learning, th...
05/02/2021

InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem

The temporal Credit Assignment Problem (CAP) is a well-known and challen...
10/15/2018

Optimizing Agent Behavior over Long Time Scales by Transporting Value

Humans spend a remarkable fraction of waking life engaged in acts of "me...
11/20/2019

Hierarchical Average Reward Policy Gradient Algorithms

Option-critic learning is a general-purpose reinforcement learning (RL) ...
11/18/2020

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

Credit assignment in reinforcement learning is the problem of measuring ...
02/24/2021

Synthetic Returns for Long-Term Credit Assignment

Since the earliest days of reinforcement learning, the workhorse method ...
10/06/2019

Probabilistic Successor Representations with Kalman Temporal Differences

The effectiveness of Reinforcement Learning (RL) depends on an animal's ...

Code Repositories

GuidanceRewards

Pytorch code for "Learning Guidance Rewards with Trajectory-space Smoothing" (NeurIPS 2020)


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.