Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

by   Koulik Khamaru, et al.

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, however, primarily within the confines of theory, providing guarantees that explain ex post the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice. We address the problem of obtaining sharp instance-dependent confidence regions for the policy evaluation problem and the optimal value estimation problem of an MDP, given access to an instance-optimal algorithm. As a consequence, we propose a data-dependent stopping rule for instance-optimal algorithms. The proposed stopping rule adapts to the instance-specific difficulty of the problem and allows for early termination for problems with favorable structure.



page 1

page 2

page 3

page 4


Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

Various algorithms in reinforcement learning exhibit dramatic variabilit...

Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms

In this paper, we propose an adaptive stopping rule for kernel-based gra...

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The theory of reinforcement learning has focused on two fundamental prob...

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

We address the problem of policy evaluation in discounted Markov decisio...

Deep Reinforcement Learning for Optimal Stopping with Application in Financial Engineering

Optimal stopping is the problem of deciding the right time at which to t...

A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning

Reinforcement Learning (RL) is a computational approach to reward-driven...

Deep Conservative Policy Iteration

Conservative Policy Iteration (CPI) is a founding algorithm of Approxima...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.