Dynamic Horizon Value Estimation for Model-based Reinforcement Learning

09/21/2020
by   Junjie Wang, et al.
0

Existing model-based value expansion methods typically leverage a world model for value estimation with a fixed rollout horizon to assist policy learning. However, the fixed rollout with an inaccurate model has a potential to harm the learning process. In this paper, we investigate the idea of using the model knowledge for value expansion adaptively. We propose a novel method called Dynamic-horizon Model-based Value Expansion (DMVE) to adjust the world model usage with different rollout horizons. Inspired by reconstruction-based techniques that can be applied for visual data novelty detection, we utilize a world model with a reconstruction module for image feature extraction, in order to acquire more precise value estimation. The raw and the reconstructed images are both used to determine the appropriate horizon for adaptive value expansion. On several benchmark visual control tasks, experimental results show that DMVE outperforms all baselines in sample efficiency and final performance, indicating that DMVE can achieve more effective and accurate value estimation than state-of-the-art model-based methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2020

On the model-based stochastic value gradient for continuous reinforcement learning

Model-based reinforcement learning approaches add explicit domain knowle...
research
12/24/2019

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

Despite its potential to improve sample complexity versus model-free app...
research
03/07/2023

Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning

Model-based reinforcement learning is one approach to increase sample ef...
research
07/04/2020

Bidirectional Model-based Policy Optimization

Model-based reinforcement learning approaches leverage a forward dynamic...
research
03/28/2022

Revisiting Model-based Value Expansion

Model-based value expansion methods promise to improve the quality of va...
research
11/16/2022

Model Based Residual Policy Learning with Applications to Antenna Control

Non-differentiable controllers and rule-based policies are widely used f...
research
02/01/2023

Adaptive hedging horizon and hedging performance estimation

In this study, we constitute an adaptive hedging method based on empiric...

Please sign up or login with your details

Forgot password? Click here to reset