Adaptive Reward-Free Exploration

06/11/2020
by   Emilie Kaufmann, et al.
3

Reward-free exploration is a reinforcement learning setting recently studied by Jin et al., who address it by running several algorithms with regret guarantees in parallel. In our work, we instead propose a more adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994, originally proposed for a different objective that we call best-policy identification. We prove that RF-UCRL needs O((SAH^4/ε^2)ln(1/δ)) episodes to output, with probability 1-δ, an ε-approximation of the optimal policy for any reward function. We empirically compare it to oracle strategies using a generative model.

READ FULL TEXT
research
10/12/2021

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

We study the model-based reward-free reinforcement learning with linear ...
research
06/15/2023

Reward-Free Curricula for Training Robust World Models

There has been a recent surge of interest in developing generally-capabl...
research
10/03/2022

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

We study the problem of deployment efficient reinforcement learning (RL)...
research
06/14/2021

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Designing provably efficient algorithms with general function approximat...
research
08/18/2020

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

There has been growing progress on theoretical analyses for provably eff...
research
02/07/2022

The Importance of Non-Markovianity in Maximum State Entropy Exploration

In the maximum state entropy exploration framework, an agent interacts w...
research
07/27/2020

Fast active learning for pure exploration in reinforcement learning

Realistic environments often provide agents with very limited feedback. ...

Please sign up or login with your details

Forgot password? Click here to reset