Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

10/17/2020
by   Chenjia Bai, et al.
9

Efficient exploration remains a challenging problem in reinforcement learning, especially for tasks where extrinsic rewards from environments are sparse or even totally disregarded. Significant advances based on intrinsic motivation show promising results in simple environments but often get stuck in environments with multimodal and stochastic dynamics. In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality and stochasticity. We consider the environmental state-action transition as a conditional generative process by generating the next-state prediction under the condition of the current state, action, and latent variable. We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration, which allows the agent to learn skills by self-supervised exploration without observing extrinsic rewards. We evaluate the proposed method on several image-based simulation tasks and a real robotic manipulating task. Our method outperforms several state-of-the-art environment model-based exploration approaches.

READ FULL TEXT

page 1

page 5

page 9

page 10

research
11/19/2019

Implicit Generative Modeling for Efficient Exploration

Efficient exploration remains a challenging problem in reinforcement lea...
research
09/12/2022

Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning

Effective exploration is critical for reinforcement learning agents in e...
research
04/15/2021

Self-Supervised Exploration via Latent Bayesian Surprise

Training with Reinforcement Learning requires a reward function that is ...
research
10/20/2021

Dynamic Bottleneck for Robust Self-Supervised Exploration

Exploration methods based on pseudo-count of transitions or curiosity of...
research
08/24/2022

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

In real-world scenarios, reinforcement learning under sparse-reward syne...
research
10/05/2020

Latent World Models For Intrinsically Motivated Exploration

In this work we consider partially observable environments with sparse r...
research
03/05/2019

Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future

In model-based reinforcement learning, the agent interleaves between mod...

Please sign up or login with your details

Forgot password? Click here to reset