A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

06/05/2023
by   Mikael Henaff, et al.
0

Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the agent's entire training experience, and episodic novelty bonuses, computed using only experience from the current episode. However, the use of these two types of bonuses has been ad-hoc and poorly understood. In this work, we shed light on the behavior of these two types of bonuses through controlled experiments on easily interpretable tasks as well as challenging pixel-based settings. We find that the two types of bonuses succeed in different settings, with episodic bonuses being most effective when there is little shared structure across episodes and global bonuses being effective when more structure is shared. We develop a conceptual framework which makes this notion of shared structure precise by considering the variance of the value function across contexts, and which provides a unifying explanation of our empirical results. We furthermore find that combining the two bonuses can lead to more robust performance across different degrees of shared structure, and investigate different algorithmic choices for defining and combining global and episodic bonuses based on function approximation. This results in an algorithm which sets a new state of the art across 16 tasks from the MiniHack suite used in prior work, and also performs robustly on Habitat and Montezuma's Revenge.

READ FULL TEXT

page 4

page 5

page 17

page 25

page 26

page 27

research
06/14/2016

Digits that are not: Generating new types through deep neural nets

For an artificial creative agent, an essential driver of the search for ...
research
05/02/2023

Unlocking the Power of Representations in Long-term Novelty-based Exploration

We introduce Robust Exploration via Clustering-based Online Density Esti...
research
11/05/2018

Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control

We propose a plan online and learn offline (POLO) framework for the sett...
research
10/11/2022

Exploration via Elliptical Episodic Bonuses

In recent years, a number of reinforcement learning (RL) methods have be...
research
05/06/2022

Geodesics, Non-linearities and the Archive of Novelty Search

The Novelty Search (NS) algorithm was proposed more than a decade ago. H...
research
02/14/2020

Never Give Up: Learning Directed Exploration Strategies

We propose a reinforcement learning agent to solve hard exploration game...

Please sign up or login with your details

Forgot password? Click here to reset