Disentangling Dynamics and Returns: Value Function Decomposition with Future Prediction

05/27/2019
by   Hongyao Tang, et al.
0

Value functions are crucial for model-free Reinforcement Learning (RL) to obtain a policy implicitly or guide the policy updates. Value estimation heavily depends on the stochasticity of environmental dynamics and the quality of reward signals. In this paper, we propose a two-step understanding of value estimation from the perspective of future prediction, through decomposing the value function into a reward-independent future dynamics part and a policy-independent trajectory return part. We then derive a practical deep RL algorithm from the above decomposition, consisting of a convolutional trajectory representation model, a conditional variational dynamics model to predict the expected representation of future trajectory and a convex trajectory return model that maps a trajectory representation to its return. Our algorithm is evaluated in MuJoCo continuous control tasks and shows superior results under both common settings and delayed reward settings.

READ FULL TEXT
research
02/19/2020

Value-driven Hindsight Modelling

Value estimation is a critical component of the reinforcement learning (...
research
04/07/2021

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

In recent years, there are great interests as well as challenges in appl...
research
06/13/2022

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

The remarkable success of reinforcement learning (RL) heavily relies on ...
research
05/29/2019

On the Generalization Gap in Reparameterizable Reinforcement Learning

Understanding generalization in reinforcement learning (RL) is a signifi...
research
10/27/2020

γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

We introduce the γ-model, a predictive model of environment dynamics wit...
research
01/04/2020

Hierarchical Reinforcement Learning as a Model of Human Task Interleaving

How do people decide how long to continue in a task, when to switch, and...
research
06/09/2023

Value function estimation using conditional diffusion models for control

A fairly reliable trend in deep reinforcement learning is that the perfo...

Please sign up or login with your details

Forgot password? Click here to reset