Implicit Generative Modeling for Efficient Exploration

11/19/2019
by   Neale Ratzlaff, et al.
0

Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. A commonly used approach for exploring such environments is to introduce some "intrinsic" reward. In this work, we focus on model uncertainty estimation as an intrinsic reward for efficient exploration. In particular, we introduce an implicit generative modeling approach to estimate a Bayesian uncertainty of the agent's belief of the environment dynamics. Each random draw from our generative model is a neural network that instantiates the dynamic function, hence multiple draws would approximate the posterior, and the variance in the future prediction based on this posterior is used as an intrinsic reward for exploration. We design a training algorithm for our generative model based on the amortized Stein Variational Gradient Descent. In experiments, we compare our implementation with state-of-the-art intrinsic reward-based exploration approaches, including two recent approaches based on an ensemble of dynamic models. In challenging exploration tasks, our implicit generative model consistently outperforms competing approaches regarding data efficiency in exploration.

READ FULL TEXT
research
10/31/2019

VASE: Variational Assorted Surprise Exploration for Reinforcement Learning

Exploration in environments with continuous control and sparse rewards r...
research
10/17/2020

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Efficient exploration remains a challenging problem in reinforcement lea...
research
05/18/2021

Fixed β-VAE Encoding for Curious Exploration in Complex 3D Environments

Curiosity is a general method for augmenting an environment reward with ...
research
02/22/2017

Counterfactual Control for Free from Generative Models

We introduce a method by which a generative model learning the joint dis...
research
06/26/2020

Intrinsic Reward Driven Imitation Learning via Generative Model

Imitation learning in a high-dimensional environment is challenging. Mos...
research
10/02/2018

EMI: Exploration with Mutual Information Maximizing State and Action Embeddings

Policy optimization struggles when the reward feedback signal is very sp...
research
12/21/2021

Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction

Nowadays, data-driven deep neural models have already shown remarkable p...

Please sign up or login with your details

Forgot password? Click here to reset