VIME: Variational Information Maximizing Exploration

05/31/2016
by   Rein Houthooft, et al.
0

Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2019

VASE: Variational Assorted Surprise Exploration for Reinforcement Learning

Exploration in environments with continuous control and sparse rewards r...
research
04/04/2018

Information Maximizing Exploration with a Latent Dynamics Model

All reinforcement learning algorithms must handle the trade-off between ...
research
01/06/2021

Geometric Entropic Exploration

Exploration is essential for solving complex Reinforcement Learning (RL)...
research
05/11/2020

Maximizing Information Gain in Partially Observable Environments via Prediction Reward

Information gathering in a partially observable environment can be formu...
research
11/15/2016

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Count-based exploration algorithms are known to perform near-optimally w...
research
07/03/2015

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

Achieving efficient and scalable exploration in complex domains poses a ...

Please sign up or login with your details

Forgot password? Click here to reset