DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

04/21/2023
by   Shanchuan Wan, et al.
0

Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness crucially decides the performance of RL algorithms, especially when facing sparse extrinsic rewards. Recent studies showed the effectiveness of encouraging exploration with intrinsic rewards estimated from novelty in observations. However, there is a gap between the novelty of an observation and an exploration in general, because the stochasticity in the environment as well as the behavior of an agent may affect the observation. To estimate exploratory behaviors accurately, we propose DEIR, a novel method where we theoretically derive an intrinsic reward from a conditional mutual information term that principally scales with the novelty contributed by agent explorations, and materialize the reward with a discriminative forward model. We conduct extensive experiments in both standard and hardened exploration games in MiniGrid to show that DEIR quickly learns a better policy than baselines. Our evaluations in ProcGen demonstrate both generalization capabilities and the general applicability of our intrinsic reward.

READ FULL TEXT

page 1

page 5

page 6

research
05/24/2023

Successor-Predecessor Intrinsic Exploration

Exploration is essential in reinforcement learning, particularly in envi...
research
08/09/2023

Intrinsic Motivation via Surprise Memory

We present a new computing model for intrinsic rewards in reinforcement ...
research
11/18/2022

Exploring through Random Curiosity with General Value Functions

Efficient exploration in reinforcement learning is a challenging problem...
research
09/28/2020

Novelty Search in representational space for sample efficient exploration

We present a new approach for efficient exploration which leverages a lo...
research
02/14/2020

Never Give Up: Learning Directed Exploration Strategies

We propose a reinforcement learning agent to solve hard exploration game...
research
05/24/2019

Exploration via Flow-Based Intrinsic Rewards

Exploration bonuses derived from the novelty of observations in an envir...
research
05/15/2023

MIMEx: Intrinsic Rewards from Masked Input Modeling

Exploring in environments with high-dimensional observations is hard. On...

Please sign up or login with your details

Forgot password? Click here to reset