DeepMDP: Learning Continuous Latent Space Models for Representation Learning

06/06/2019
by   Carles Gelada, et al.
4

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. We connect these results to prior work in the bisimulation literature, and explore the use of a variety of metrics. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.

READ FULL TEXT

page 5

page 8

page 9

research
09/18/2022

Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective

While reinforcement learning (RL) methods that learn an internal model o...
research
06/15/2020

Analytic Manifold Learning: Unifying and Evaluating Representations for Continuous Control

We address the problem of learning reusable state representations from s...
research
10/27/2021

Towards Robust Bisimulation Metric Learning

Learned representations in deep reinforcement learning (DRL) have to ext...
research
06/14/2021

Temporal Predictive Coding For Model-Based Planning In Latent Space

High-dimensional observations are a major challenge in the application o...
research
07/15/2022

Outcome-Guided Counterfactuals for Reinforcement Learning Agents from a Jointly Trained Generative Latent Space

We present a novel generative method for producing unseen and plausible ...
research
05/27/2022

Provably Sample-Efficient RL with Side Information about Latent Dynamics

We study reinforcement learning (RL) in settings where observations are ...
research
11/02/2019

Beta DVBF: Learning State-Space Models for Control from High Dimensional Observations

Learning a model of dynamics from high-dimensional images can be a core ...

Please sign up or login with your details

Forgot password? Click here to reset