An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

07/10/2019
by   Mirco Mutti, et al.
0

What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards? Ideally, we would like to get a policy driving towards a uniform state-action visitation (highly exploring) in a minimum number of steps (fast mixing), in order to ease efficient learning of any goal-conditioned policy later on. Unfortunately, it is remarkably arduous to directly learn an optimal policy of this nature. In this paper, we propose a novel surrogate objective for learning highly exploring and fast mixing policies, which focuses on maximizing a lower bound to the entropy of the steady-state distribution induced by the policy. In particular, we introduce three novel lower bounds, that lead to as many optimization problems, that tradeoff the theoretical guarantees with computational complexity. Then, we present a model-based reinforcement learning algorithm, IDE^3AL, to learn an optimal policy according to the introduced objective. Finally, we provide an empirical evaluation of this algorithm on a set of hard-exploration tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2018

Revisiting Exploration-Conscious Reinforcement Learning

The objective of Reinforcement Learning is to learn an optimal policy by...
research
08/27/2019

Exploration-Enhanced POLITEX

We study algorithms for average-cost reinforcement learning problems wit...
research
01/26/2023

Model-based Offline Reinforcement Learning with Local Misspecification

We present a model-based offline reinforcement learning policy performan...
research
10/07/2019

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Modern deep learning methods provide an effective means to learn good re...
research
12/29/2020

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

We investigate the exploration of an unknown environment when no reward ...
research
02/24/2020

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

It has been a trend in the Reinforcement Learning literature to derive s...
research
06/27/2019

ExTra: Transfer-guided Exploration

In this work we present a novel approach for transfer-guided exploration...

Please sign up or login with your details

Forgot password? Click here to reset