Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

01/21/2022
by   Koulik Khamaru, et al.
3

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, however, primarily within the confines of theory, providing guarantees that explain ex post the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice. We address the problem of obtaining sharp instance-dependent confidence regions for the policy evaluation problem and the optimal value estimation problem of an MDP, given access to an instance-optimal algorithm. As a consequence, we propose a data-dependent stopping rule for instance-optimal algorithms. The proposed stopping rule adapts to the instance-specific difficulty of the problem and allows for early termination for problems with favorable structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2022

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

While much progress has been made in understanding the minimax sample co...
research
06/28/2021

Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

Various algorithms in reinforcement learning exhibit dramatic variabilit...
research
01/09/2020

Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms

In this paper, we propose an adaptive stopping rule for kernel-based gra...
research
08/05/2021

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The theory of reinforcement learning has focused on two fundamental prob...
research
06/24/2023

Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data

Developing theoretical guarantees on the sample complexity of offline RL...
research
11/19/2020

Online Model Selection for Reinforcement Learning with Function Approximation

Deep reinforcement learning has achieved impressive successes yet often ...
research
10/29/2021

Adaptive Discretization in Online Reinforcement Learning

Discretization based approaches to solving online reinforcement learning...

Please sign up or login with your details

Forgot password? Click here to reset