DeepAI AI Chat
Log In Sign Up

Deep Reinforcement and InfoMax Learning

06/12/2020
by   Bogdan Mazoure, et al.
McGill University
0

Our work is based on the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal representation of successive timesteps. We provide an intuitive analysis of the convergence properties of our approach from the perspective of Markov chain mixing times and argue that convergence of the lower bound on mutual information is related to the inverse absolute spectral gap of the transition model. We test our approach in several synthetic settings, where it successfully learns representations that are predictive of the future. Finally, we augment C51, a strong RL baseline, with our temporal DIM objective and demonstrate improved performance on a continual learning task and on the recently introduced Procgen environment.

READ FULL TEXT
01/16/2020

MIME: Mutual Information Minimisation Exploration

We show that reinforcement learning agents that learn by surprise (surpr...
07/24/2020

Predictive Information Accelerates Learning in RL

The Predictive Information is the mutual information between the past an...
07/12/2020

Data-Efficient Reinforcement Learning with Momentum Predictive Representations

While deep reinforcement learning excels at solving tasks where large am...
12/13/2021

Continual Learning In Environments With Polynomial Mixing Times

The mixing time of the Markov chain induced by a policy limits performan...
04/30/2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Learning a good representation is an essential component for deep reinfo...
04/11/2020

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

Training objectives based on predictive coding have recently been shown ...
02/09/2018

Optimized Bacteria are Environmental Prediction Engines

Experimentalists have observed phenotypic variability in isogenic bacter...