Approximate Exploration through State Abstraction

08/29/2018
by   Adrien Ali Taïga, et al.
2

Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. We first provide results when the approximation is explicit, quantifying the performance of an exploration algorithm, MBIE-EB strehl2008analysis, when combined with state aggregation. In particular, we show that this allows the agent to trade off between learning speed and quality of the policy learned. We then turn to a successful exploration scheme in practical, pseudo-count based exploration bonuses bellemare2016unifying. We show that choosing a density model implicitly defines an abstraction and that the pseudo-count bonus incentivizes the agent to explore using this abstraction. We find, however, that implicit exploration may result in a mismatch between the approximated value function and exploration bonus, leading to either under- or over-exploration.

READ FULL TEXT
research
07/31/2018

Count-Based Exploration with the Successor Representation

The problem of exploration in reinforcement learning is well-understood ...
research
06/06/2016

Unifying Count-Based Exploration and Intrinsic Motivation

We consider an agent's uncertainty about its environment and the problem...
research
11/13/2019

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

We present an algorithm, HOMER, for exploration and reinforcement learni...
research
02/26/2020

Optimistic Exploration even with a Pessimistic Initialisation

Optimistic initialisation is an effective strategy for efficient explora...
research
06/19/2022

Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian...
research
04/21/2023

Approximate Shielding of Atari Agents for Safe Exploration

Balancing exploration and conservatism in the constrained setting is an ...
research
10/20/2021

Dynamic Bottleneck for Robust Self-Supervised Exploration

Exploration methods based on pseudo-count of transitions or curiosity of...

Please sign up or login with your details

Forgot password? Click here to reset