DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

04/11/2018
by   Leshem Choshen, et al.
0

Exploration is a fundamental aspect of Reinforcement Learning, typically implemented using stochastic action-selection. Exploration, however, can be more efficient if directed toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality. While there are a few model-based solutions to this shortcoming, a model-free approach is still missing. We propose E-values, a generalization of counters that can be used to evaluate the propagating exploratory value over state-action trajectories. We compare our approach to commonly used RL techniques, and show that using E-values improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to efficiently learn continuous MDPs. We demonstrate this by showing that our approach surpasses state of the art performance in the Freeway Atari 2600 game.

READ FULL TEXT

page 9

page 14

page 15

research
02/26/2020

Optimistic Exploration even with a Pessimistic Initialisation

Optimistic initialisation is an effective strategy for efficient explora...
research
08/31/2018

Directed Exploration in PAC Model-Free Reinforcement Learning

We study an exploration method for model-free RL that generalizes the co...
research
11/22/2017

Depth Control of Model-Free AUVs via Reinforcement Learning

In this paper, we consider depth control problems of an autonomous under...
research
11/15/2018

Context-Dependent Upper-Confidence Bounds for Directed Exploration

Directed exploration strategies for reinforcement learning are critical ...
research
01/31/2023

Learning, Fast and Slow: A Goal-Directed Memory-Based Approach for Dynamic Environments

Model-based next state prediction and state value prediction are slow to...
research
06/01/2018

Strategic Object Oriented Reinforcement Learning

Humans learn to play video games significantly faster than state-of-the-...
research
07/05/2018

Goal-oriented Trajectories for Efficient Exploration

Exploration is a difficult challenge in reinforcement learning and even ...

Please sign up or login with your details

Forgot password? Click here to reset