First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

12/07/2021
by   Andrew Wagenmaker, et al.
0

Obtaining first-order regret bounds – regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance – is a core question in sequential decision-making. While such bounds exist in many settings, they have proven elusive in reinforcement learning with large state spaces. In this work we address this gap, and show that it is possible to obtain regret scaling as 𝒪(√(V_1^⋆ K)) in reinforcement learning with large state spaces, namely the linear MDP setting. Here V_1^⋆ is the value of the optimal policy and K is the number of episodes. We demonstrate that existing techniques based on least squares estimation are insufficient to obtain this result, and instead develop a novel robust self-normalized concentration bound based on the robust Catoni mean estimator, which may be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2021

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The theory of reinforcement learning has focused on two fundamental prob...
research
06/18/2011

Robust Bayesian reinforcement learning through tight lower bounds

In the Bayesian approach to sequential decision making, exact calculatio...
research
06/13/2023

Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Reinforcement learning (RL) has shown empirical success in various real ...
research
06/12/2023

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

While numerous works have focused on devising efficient algorithms for r...
research
04/15/2021

Scale Invariant Solutions for Overdetermined Linear Systems with Applications to Reinforcement Learning

Overdetermined linear systems are common in reinforcement learning, e.g....
research
05/25/2022

Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions

Due to the drastic gap in complexity between sequential and batch statis...
research
02/02/2022

Transfer in Reinforcement Learning via Regret Bounds for Learning Agents

We present an approach for the quantification of the usefulness of trans...

Please sign up or login with your details

Forgot password? Click here to reset