Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning

09/19/2022
by   Mingqi Yuan, et al.
0

Exploration is critical for deep reinforcement learning in complex environments with high-dimensional observations and sparse rewards. To address this problem, recent approaches proposed to leverage intrinsic rewards to improve exploration, such as novelty-based exploration and prediction-based exploration. However, many intrinsic reward modules require sophisticated structures and representation learning, resulting in prohibitive computational complexity and unstable performance. In this paper, we propose Rewarding Episodic Visitation Discrepancy (REVD), a computation-efficient and quantified exploration method. More specifically, REVD provides intrinsic rewards by evaluating the Rényi divergence-based visitation discrepancy between episodes. To make efficient divergence estimation, a k-nearest neighbor estimator is utilized with a randomly-initialized state encoder. Finally, the REVD is tested on Atari games and PyBullet Robotics Environments. Extensive experiments demonstrate that REVD can significantly improves the sample efficiency of reinforcement learning algorithms and outperforms the benchmarking methods.

READ FULL TEXT
research
05/24/2023

Successor-Predecessor Intrinsic Exploration

Exploration is essential in reinforcement learning, particularly in envi...
research
03/08/2022

Rényi State Entropy for Exploration Acceleration in Reinforcement Learning

One of the most critical challenges in deep reinforcement learning is to...
research
02/18/2021

State Entropy Maximization with Random Encoders for Efficient Exploration

Recent exploration methods have proven to be a recipe for improving samp...
research
07/19/2021

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

Maintaining long-term exploration ability remains one of the challenges ...
research
11/18/2022

Exploring through Random Curiosity with General Value Functions

Efficient exploration in reinforcement learning is a challenging problem...
research
05/24/2019

Exploration via Flow-Based Intrinsic Rewards

Exploration bonuses derived from the novelty of observations in an envir...
research
12/08/2020

Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) methods have shown strong samp...

Please sign up or login with your details

Forgot password? Click here to reset