DeepAI AI Chat
Log In Sign Up

Deep Reinforcement and InfoMax Learning

by   Bogdan Mazoure, et al.
McGill University

Our work is based on the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal representation of successive timesteps. We provide an intuitive analysis of the convergence properties of our approach from the perspective of Markov chain mixing times and argue that convergence of the lower bound on mutual information is related to the inverse absolute spectral gap of the transition model. We test our approach in several synthetic settings, where it successfully learns representations that are predictive of the future. Finally, we augment C51, a strong RL baseline, with our temporal DIM objective and demonstrate improved performance on a continual learning task and on the recently introduced Procgen environment.


MIME: Mutual Information Minimisation Exploration

We show that reinforcement learning agents that learn by surprise (surpr...

Predictive Information Accelerates Learning in RL

The Predictive Information is the mutual information between the past an...

Data-Efficient Reinforcement Learning with Momentum Predictive Representations

While deep reinforcement learning excels at solving tasks where large am...

Continual Learning In Environments With Polynomial Mixing Times

The mixing time of the Markov chain induced by a policy limits performan...

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Learning a good representation is an essential component for deep reinfo...

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

Training objectives based on predictive coding have recently been shown ...

Optimized Bacteria are Environmental Prediction Engines

Experimentalists have observed phenotypic variability in isogenic bacter...