Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

08/05/2021
by   Andrew Wagenmaker, et al.
0

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying ϵ-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an ϵ-optimal policy and achieve the worst-case optimal rate, it is unknown whether low-regret algorithms can obtain the instance-optimal rate for policy identification. We show that this is not possible – there exists a fundamental tradeoff between achieving low regret and identifying an ϵ-optimal policy at the instance-optimal rate. Motivated by our negative finding, we propose a new measure of instance-dependent sample complexity for PAC tabular reinforcement learning which explicitly accounts for the attainable state visitation distributions in the underlying MDP. We then propose and analyze a novel, planning-based algorithm which attains this sample complexity – yielding a complexity which scales with the suboptimality gaps and the “reachability” of a state. We show that our algorithm is nearly minimax optimal, and on several examples that our instance-dependent sample complexity offers significant improvements over worst-case bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2022

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

While much progress has been made in understanding the minimax sample co...
research
12/07/2021

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

Obtaining first-order regret bounds – regret bounds scaling not as the w...
research
07/12/2022

Optimistic PAC Reinforcement Learning: the Instance-Dependent View

Optimistic algorithms have been extensively studied for regret minimizat...
research
06/01/2022

On Gap-dependent Bounds for Offline Reinforcement Learning

This paper presents a systematic study on gap-dependent sample complexit...
research
01/21/2022

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic vari...
research
01/11/2023

Adversarial Online Multi-Task Reinforcement Learning

We consider the adversarial online multi-task reinforcement learning set...
research
10/09/2019

Robust Monopoly Regulation

We study the regulation of a monopolistic firm using a robust-design app...

Please sign up or login with your details

Forgot password? Click here to reset