Information Maximizing Exploration with a Latent Dynamics Model

04/04/2018
by   Trevor Barron, et al.
0

All reinforcement learning algorithms must handle the trade-off between exploration and exploitation. Many state-of-the-art deep reinforcement learning methods use noise in the action selection, such as Gaussian noise in policy gradient methods or ϵ-greedy in Q-learning. While these methods are appealing due to their simplicity, they do not explore the state space in a methodical manner. We present an approach that uses a model to derive reward bonuses as a means of intrinsic motivation to improve model-free reinforcement learning. A key insight of our approach is that this dynamics model can be learned in the latent feature space of a value function, representing the dynamics of the agent and the environment. This method is both theoretically grounded and computationally advantageous, permitting the efficient use of Bayesian information-theoretic methods in high-dimensional state spaces. We evaluate our method on several continuous control tasks, focusing on improving exploration.

READ FULL TEXT
research
11/20/2019

Bayesian Curiosity for Efficient Exploration in Reinforcement Learning

Balancing exploration and exploitation is a fundamental part of reinforc...
research
06/17/2019

Learning-Driven Exploration for Reinforcement Learning

Deep reinforcement learning algorithms have been shown to learn complex ...
research
05/31/2023

Latent Exploration for Reinforcement Learning

In Reinforcement Learning, agents learn policies by exploring and intera...
research
05/31/2016

VIME: Variational Information Maximizing Exploration

Scalable and effective exploration remains a key challenge in reinforcem...
research
10/27/2021

Towards Robust Bisimulation Metric Learning

Learned representations in deep reinforcement learning (DRL) have to ext...
research
10/21/2021

Anti-Concentrated Confidence Bonuses for Scalable Exploration

Intrinsic rewards play a central role in handling the exploration-exploi...
research
07/02/2020

ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Resolving the exploration-exploitation trade-off remains a fundamental p...

Please sign up or login with your details

Forgot password? Click here to reset