Pairwise Weights for Temporal Credit Assignment

02/09/2021
by   Zeyu Zheng, et al.
0

How much credit (or blame) should an action taken in a state get for a future reward? This is the fundamental temporal credit assignment problem in Reinforcement Learning (RL). One of the earliest and still most widely used heuristics is to assign this credit based on a scalar coefficient λ (treated as a hyperparameter) raised to the power of the time interval between the state-action and the reward. In this empirical paper, we explore heuristics based on more general pairwise weightings that are functions of the state in which the action was taken, the state at the time of the reward, as well as the time interval between the two. Of course it isn't clear what these pairwise weight functions should be, and because they are too complex to be treated as hyperparameters we develop a metagradient procedure for learning these weight functions during the usual RL training of a policy. Our empirical work shows that it is often possible to learn these pairwise weight functions during learning of the policy to achieve better performance than competing approaches.

READ FULL TEXT

page 5

page 7

page 17

page 25

research
06/08/2021

Towards Practical Credit Assignment for Deep Reinforcement Learning

Credit assignment is a fundamental problem in reinforcement learning, th...
research
03/10/2021

An Information-Theoretic Perspective on Credit Assignment in Reinforcement Learning

How do we formalize the challenge of credit assignment in reinforcement ...
research
06/29/2023

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

To make reinforcement learning more sample efficient, we need better cre...
research
01/27/2019

Reward Shaping via Meta-Learning

Reward shaping is one of the most effective methods to tackle the crucia...
research
10/12/2022

Contrastive introspection (ConSpec) to rapidly identify invariant steps for success

Reinforcement learning (RL) algorithms have achieved notable success in ...
research
11/18/2020

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

Credit assignment in reinforcement learning is the problem of measuring ...
research
02/07/2021

Ensemble perspective for understanding temporal credit assignment

Recurrent neural networks are widely used for modeling spatio-temporal s...

Please sign up or login with your details

Forgot password? Click here to reset