Temporal Difference Models: Model-Free Deep RL for Model-Based Control

02/25/2018
by   Vitchyr Pong, et al.
0

Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. A limiting factor in classic model-free RL is that the learning signal consists only of scalar rewards, ignoring much of the rich information contained in state transition tuples. Model-based RL uses this information, by training a predictive model, but often does not achieve the same asymptotic performance as model-free RL due to model bias. We introduce temporal difference models (TDMs), a family of goal-conditioned value functions that can be trained with model-free learning and used for model-based control. TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn very efficiently, while still attaining asymptotic performance that exceeds that of direct model-based RL methods. Our experimental results show that, on a range of continuous control tasks, TDMs provide a substantial improvement in efficiency compared to state-of-the-art model-based and model-free methods.

READ FULL TEXT
research
12/09/2018

The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint

The effectiveness of model-based versus model-free methods is a long-sta...
research
12/08/2019

Value-of-Information based Arbitration between Model-based and Model-free Control

There have been numerous attempts in explaining the general learning beh...
research
04/09/2021

Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

Model-based reinforcement learning (RL) is more sample efficient than mo...
research
06/09/2020

Variational Model-based Policy Optimization

Model-based reinforcement learning (RL) algorithms allow us to combine m...
research
05/28/2020

Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning

Model-free deep reinforcement learning (RL) agents can learn an effectiv...
research
03/26/2021

Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow

In the past decade, model-free reinforcement learning (RL) has provided ...
research
03/08/2021

Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars

Despite the rich theoretical foundation of model-based deep reinforcemen...

Please sign up or login with your details

Forgot password? Click here to reset