Optimistic PAC Reinforcement Learning: the Instance-Dependent View

07/12/2022
by   Andrea Tirinzoni, et al.
0

Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2021) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near-optimal. On the technical side, our analysis is very simple thanks to a new "target trick" of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2022

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

In probably approximately correct (PAC) reinforcement learning (RL), an ...
research
07/06/2022

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

While much progress has been made in understanding the minimax sample co...
research
08/05/2021

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The theory of reinforcement learning has focused on two fundamental prob...
research
06/23/2023

Active Coverage for PAC Reinforcement Learning

Collecting and leveraging data with good coverage properties plays a cru...
research
02/11/2021

Sample-Optimal PAC Learning of Halfspaces with Malicious Noise

We study efficient PAC learning of homogeneous halfspaces in ℝ^d in the ...
research
05/15/2023

Uniform-PAC Guarantees for Model-Based RL with Bounded Eluder Dimension

Recently, there has been remarkable progress in reinforcement learning (...
research
05/24/2018

Learning convex polytopes with margin

We present a near-optimal algorithm for properly learning convex polytop...

Please sign up or login with your details

Forgot password? Click here to reset