Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning

07/21/2023
by   Akash Velu, et al.
0

Oftentimes, environments for sequential decision-making problems can be quite sparse in the provision of evaluative feedback to guide reinforcement-learning agents. In the extreme case, long trajectories of behavior are merely punctuated with a single terminal feedback signal, engendering a significant temporal delay between the observation of non-trivial reward and the individual steps of behavior culpable for eliciting such feedback. Coping with such a credit assignment challenge is one of the hallmark characteristics of reinforcement learning and, in this work, we capitalize on existing importance-sampling ratio estimation techniques for off-policy evaluation to drastically improve the handling of credit assignment with policy-gradient methods. While the use of so-called hindsight policies offers a principled mechanism for reweighting on-policy data by saliency to the observed trajectory return, naively applying importance sampling results in unstable or excessively lagged learning. In contrast, our hindsight distribution correction facilitates stable, efficient learning across a broad range of environments where credit assignment plagues baseline methods.

READ FULL TEXT

page 3

page 20

research
06/12/2021

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

Marginalized importance sampling (MIS), which measures the density ratio...
research
05/31/2019

Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement Learning

Recent advances in deep reinforcement learning algorithms have shown gre...
research
06/28/2021

Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment

Many transfer problems require re-using previously optimal decisions for...
research
12/23/2021

Improving the Efficiency of Off-Policy Reinforcement Learning by Accounting for Past Decisions

Off-policy learning from multistep returns is crucial for sample-efficie...
research
01/26/2023

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

Off-policy learning from multistep returns is crucial for sample-efficie...
research
07/16/2023

Credit Assignment: Challenges and Opportunities in Developing Human-like AI Agents

Temporal credit assignment is crucial for learning and skill development...
research
03/07/2022

On Credit Assignment in Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) has held longstanding promise ...

Please sign up or login with your details

Forgot password? Click here to reset