Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

07/19/2021
by   Mingqi Yuan, et al.
16

Maintaining long-term exploration ability remains one of the challenges of deep reinforcement learning (DRL). In practice, the reward shaping-based approaches are leveraged to provide intrinsic rewards for the agent to incentivize motivation. However, most existing IRS modules rely on attendant models or additional memory to record and analyze learning procedures, which leads to high computational complexity and low robustness. Moreover, they overemphasize the influence of a single state on exploration, which cannot evaluate the exploration performance from a global perspective. To tackle the problem, state entropy-based methods are proposed to encourage the agent to visit the state space more equitably. However, the estimation error and sample complexity are prohibitive when handling environments with high-dimensional observation. In this paper, we introduce a novel metric entitled Jain's fairness index (JFI) to replace the entropy regularizer, which requires no additional models or memory. In particular, JFI overcomes the vanishing intrinsic rewards problem and can be generalized into arbitrary tasks. Furthermore, we use a variational auto-encoder (VAE) model to capture the life-long novelty of states. Finally, the global JFI score and local state novelty are combined to form a multimodal intrinsic reward, controlling the exploration extent more precisely. Finally, extensive simulation results demonstrate that our multimodal reward shaping (MMRS) method can achieve higher performance in contrast to other benchmark schemes.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 7

page 8

page 10

page 11

research
03/08/2022

Rényi State Entropy for Exploration Acceleration in Reinforcement Learning

One of the most critical challenges in deep reinforcement learning is to...
research
09/19/2022

Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning

Exploration is critical for deep reinforcement learning in complex envir...
research
08/09/2023

Intrinsic Motivation via Surprise Memory

We present a new computing model for intrinsic rewards in reinforcement ...
research
08/30/2023

Cyclophobic Reinforcement Learning

In environments with sparse rewards, finding a good inductive bias for e...
research
10/11/2022

LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward

Episodic count has been widely used to design a simple yet effective int...
research
09/12/2022

Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning

Effective exploration is critical for reinforcement learning agents in e...
research
01/20/2021

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Exploration under sparse reward is a long-standing challenge of model-fr...

Please sign up or login with your details

Forgot password? Click here to reset