A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

11/21/2016
by   Felix Leibfried, et al.
0

Reinforcement learning is concerned with identifying reward-maximizing behaviour policies in environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as deep Q-networks, are model-free and learn to act effectively across a wide range of environments such as Atari games, but require huge amounts of data. Model-based techniques are more data-efficient, but need to acquire explicit knowledge about the environment. In this paper, we take a step towards using model-based techniques in environments with a high-dimensional visual state space by demonstrating that it is possible to learn system dynamics and the reward structure jointly. Our contribution is to extend a recently developed deep neural network for video frame prediction in Atari games to enable reward prediction as well. To this end, we phrase a joint optimization problem for minimizing both video frame and reward reconstruction loss, and adapt network parameters accordingly. Empirical evaluations on five Atari games demonstrate accurate cumulative reward prediction of up to 200 frames. We consider these results as opening up important directions for model-based reinforcement learning in complex, initially unknown environments.

READ FULL TEXT

page 8

page 15

page 16

page 18

research
12/09/2019

Learning Latent State Spaces for Planning through Reward Prediction

Model-based reinforcement learning methods typically learn models for hi...
research
03/01/2019

Model-Based Reinforcement Learning for Atari

Model-free reinforcement learning (RL) can be used to learn effective po...
research
05/09/2017

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

We present a new deep meta reinforcement learner, which we call Deep Epi...
research
01/31/2019

Successor Features Support Model-based and Model-free Reinforcement Learning

One key challenge in reinforcement learning is the ability to generalize...
research
05/16/2022

Deep Apprenticeship Learning for Playing Games

In the last decade, deep learning has achieved great success in machine ...
research
06/01/2021

Did I do that? Blame as a means to identify controlled effects in reinforcement learning

Modeling controllable aspects of the environment enable better prioritiz...

Please sign up or login with your details

Forgot password? Click here to reset