Value Prediction Network

07/11/2017
by   Junhyuk Oh, et al.
0

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

READ FULL TEXT

page 6

page 8

page 11

page 12

page 13

research
12/09/2019

Learning Latent State Spaces for Planning through Reward Prediction

Model-based reinforcement learning methods typically learn models for hi...
research
09/05/2017

Knowledge Sharing for Reinforcement Learning: Writing a BOOK

This paper proposes a novel deep reinforcement learning (RL) method inte...
research
10/31/2017

TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning

Combining deep model-free reinforcement learning with on-line planning i...
research
12/28/2018

Dynamic Planning Networks

We introduce Dynamic Planning Networks (DPN), a novel architecture for d...
research
02/08/2018

Learning and Querying Fast Generative Models for Reinforcement Learning

A key challenge in model-based reinforcement learning (RL) is to synthes...
research
08/23/2022

What deep reinforcement learning tells us about human motor learning and vice-versa

Machine learning and specifically reinforcement learning (RL) has been e...
research
02/19/2020

Value-driven Hindsight Modelling

Value estimation is a critical component of the reinforcement learning (...

Please sign up or login with your details

Forgot password? Click here to reset