γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

10/27/2020 ∙ by Michael Janner, et al. ∙ 15

We introduce the γ-model, a predictive model of environment dynamics with an infinite probabilistic horizon. Replacing standard single-step models with γ-models leads to generalizations of the procedures that form the foundation of model-based control, including the model rollout and model-based value estimation. The γ-model, trained with a generative reinterpretation of temporal difference learning, is a natural continuous analogue of the successor representation and a hybrid between model-free and model-based mechanisms. Like a value function, it contains information about the long-term future; like a standard predictive model, it is independent of task reward. We instantiate the γ-model as both a generative adversarial network and normalizing flow, discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors, and empirically investigate its utility for prediction and control.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 4

page 6

page 7

page 10

page 11

page 16

page 17

Code Repositories

gamma-models

Code for the paper "Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction"


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.