MBVI: Model-Based Value Initialization for Reinforcement Learning

by   Xubo Lyu, et al.

Model-free reinforcement learning (RL) is capable of learning control policies for high-dimensional, complex robotic tasks, but tends to be data inefficient. Model-based RL and optimal control have been proven to be much more data-efficient if an accurate model of the system and environment is known, but can be difficult to scale to expressive models for high-dimensional problems. In this paper, we propose a novel approach to alleviate data inefficiency of model-free RL by warm-starting the learning process using model-based solutions. We do so by initializing a high-dimensional value function via supervision from a low-dimensional value function obtained by applying model-based techniques on a low-dimensional problem featuring an approximate system model. Therefore, our approach exploits the model priors from a simplified problem space implicitly and avoids the direct use of high-dimensional, expressive models. We demonstrate our approach on two representative robotic learning tasks and observe significant improvements in performance and efficiency, and analyze our method empirically with a third task.



There are no comments yet.


page 1

page 2

page 3

page 4


TTR-Based Rewards for Reinforcement Learning with Implicit Model Priors

Model-free reinforcement learning (RL) provides an attractive approach f...

Blending MPC Value Function Approximation for Efficient Reinforcement Learning

Model-Predictive Control (MPC) is a powerful tool for controlling comple...

Self-supervised Learning of Image Embedding for Continuous Control

Operating directly from raw high dimensional sensory inputs like images ...

Residual Pathway Priors for Soft Equivariance Constraints

There is often a trade-off between building deep learning systems that a...

Frequency-based Search-control in Dyna

Model-based reinforcement learning has been empirically demonstrated as ...

Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement Learning

Value-function (VF) approximation is a central problem in Reinforcement ...

Towards a Common Implementation of Reinforcement Learning for Multiple Robotic Tasks

Mobile robots are increasingly being employed for performing complex tas...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.