Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

04/14/2023
by   Gen Li, et al.
7

This paper studies reward-agnostic exploration in reinforcement learning (RL) – a scenario where the learner is unware of the reward functions during the exploration stage – and designs an algorithm that improves over the state of the art. More precisely, consider a finite-horizon non-stationary Markov decision process with S states, A actions, and horizon length H, and suppose that there are no more than a polynomial number of given reward functions of interest. By collecting an order of SAH^3/ε^2 sample episodes (up to log factor) without guidance of the reward information, our algorithm is able to find ε-optimal policies for all these reward functions, provided that ε is sufficiently small. This forms the first reward-agnostic exploration scheme in this context that achieves provable minimax optimality. Furthermore, once the sample size exceeds S^2AH^3/ε^2 episodes (up to log factor), our algorithm is able to yield ε accuracy for arbitrarily many reward functions (even when they are adversarially designed), a task commonly dubbed as “reward-free exploration.” The novelty of our algorithm design draws on insights from offline RL: the exploration scheme attempts to maximize a critical reward-agnostic quantity that dictates the performance of offline RL, while the policy learning paradigm leverages ideas from sample-optimal offline RL paradigms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

On Reward-Free Reinforcement Learning with Linear Function Approximation

Reward-free reinforcement learning (RL) is a framework which is suitable...
research
04/11/2022

Settling the Sample Complexity of Model-Based Offline Reinforcement Learning

This paper is concerned with offline reinforcement learning (RL), which ...
research
08/11/2021

Gap-Dependent Unsupervised Exploration for Reinforcement Learning

For the problem of task-agnostic reinforcement learning (RL), an agent f...
research
06/16/2020

Task-agnostic Exploration in Reinforcement Learning

Efficient exploration is one of the main challenges in reinforcement lea...
research
10/09/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Achieving sample efficiency in online episodic reinforcement learning (R...
research
02/27/2023

The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

Self-supervised methods have become crucial for advancing deep learning ...
research
12/05/2022

L2SR: Learning to Sample and Reconstruct for Accelerated MRI

Accelerated MRI aims to find a pair of samplers and reconstructors to re...

Please sign up or login with your details

Forgot password? Click here to reset