γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

10/27/2020
by   Michael Janner, et al.
15

We introduce the γ-model, a predictive model of environment dynamics with an infinite probabilistic horizon. Replacing standard single-step models with γ-models leads to generalizations of the procedures that form the foundation of model-based control, including the model rollout and model-based value estimation. The γ-model, trained with a generative reinterpretation of temporal difference learning, is a natural continuous analogue of the successor representation and a hybrid between model-free and model-based mechanisms. Like a value function, it contains information about the long-term future; like a standard predictive model, it is independent of task reward. We instantiate the γ-model as both a generative adversarial network and normalizing flow, discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors, and empirically investigate its utility for prediction and control.

READ FULL TEXT

page 3

page 4

page 6

page 7

page 10

page 11

page 16

page 17

research
12/24/2019

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

Despite its potential to improve sample complexity versus model-free app...
research
09/20/2018

Predicting Periodicity with Temporal Difference Learning

Temporal difference (TD) learning is an important approach in reinforcem...
research
06/29/2020

Learning and Planning in Average-Reward Markov Decision Processes

We introduce improved learning and planning algorithms for average-rewar...
research
05/27/2019

Disentangling Dynamics and Returns: Value Function Decomposition with Future Prediction

Value functions are crucial for model-free Reinforcement Learning (RL) t...
research
01/30/2023

On the Statistical Benefits of Temporal Difference Learning

Given a dataset on actions and resulting long-term rewards, a direct est...
research
01/11/2020

Prediction with eventual almost sure guarantees

We study the problem of predicting the properties of a probabilistic mod...
research
04/14/2023

Model Predictive Control with Self-supervised Representation Learning

Over the last few years, we have not seen any major developments in mode...

Please sign up or login with your details

Forgot password? Click here to reset