Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

05/09/2017
by   Steven Stenberg Hansen, et al.
0

We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2018

Policy Optimization with Model-based Explorations

Model-free reinforcement learning methods such as the Proximal Policy Op...
research
01/31/2019

Successor Features Support Model-based and Model-free Reinforcement Learning

One key challenge in reinforcement learning is the ability to generalize...
research
08/08/2017

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

Model-free deep reinforcement learning algorithms have been shown to be ...
research
11/21/2016

A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

Reinforcement learning is concerned with identifying reward-maximizing b...
research
03/13/2017

Reinforcement Learning for Transition-Based Mention Detection

This paper describes an application of reinforcement learning to the men...
research
10/11/2021

Neural Algorithmic Reasoners are Implicit Planners

Implicit planning has emerged as an elegant technique for combining lear...
research
06/01/2018

Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning

Learning a generative model is a key component of model-based reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset