Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement Learning

05/31/2019
by   Yang Liu, et al.
8

Recent advances in deep reinforcement learning algorithms have shown great potential and success for solving many challenging real-world problems, including Go game and robotic applications. Usually, these algorithms need a carefully designed reward function to guide training in each time step. However, in real world, it is non-trivial to design such a reward function, and the only signal available is usually obtained at the end of a trajectory, also known as the episodic reward or return. In this work, we introduce a new algorithm for temporal credit assignment, which learns to decompose the episodic return back to each time-step in the trajectory using deep neural networks. With this learned reward signal, the learning efficiency can be substantially improved for episodic reinforcement learning. In particular, we find that expressive language models such as the Transformer can be adopted for learning the importance and the dependency of states in the trajectory, therefore providing high-quality and interpretable learned reward signals. We have performed extensive experiments on a set of MuJoCo continuous locomotive control tasks with only episodic returns and demonstrated the effectiveness of our algorithm.

READ FULL TEXT
research
07/21/2023

Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning

Oftentimes, environments for sequential decision-making problems can be ...
research
09/13/2023

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

Although Deep Reinforcement Learning (DRL) has achieved notable success ...
research
12/27/2021

Multiagent Model-based Credit Assignment for Continuous Control

Deep reinforcement learning (RL) has recently shown great promise in rob...
research
02/03/2023

Better Training of GFlowNets with Local Credit and Incomplete Trajectories

Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov ...
research
09/25/2020

Deep Reinforcement Learning with Stage Incentive Mechanism for Robotic Trajectory Planning

To improve the efficiency of deep reinforcement learning (DRL) based met...
research
03/05/2020

Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement Learning

Similar trajectory search is a fundamental problem and has been well stu...
research
10/08/2020

Maximum Reward Formulation In Reinforcement Learning

Reinforcement learning (RL) algorithms typically deal with maximizing th...

Please sign up or login with your details

Forgot password? Click here to reset