Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

10/23/2020
by   Priyank Agrawal, et al.
0

This paper studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). We analyze the algorithm using a novel intertwined regret decomposition. Our Õ(H^2S√(AT)) high-probability worst-case regret bound improves the previous sharpest worst-case regret bounds for RLSVI and matches the existing state-of-the-art worst-case TS-based regret bounds.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/07/2019

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

This paper studies a recent proposal to use randomized value functions t...
10/25/2021

Can Q-Learning be Improved with Advice?

Despite rapid progress in theoretical reinforcement learning (RL) over t...
01/01/2019

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

Strong worst-case performance bounds for episodic reinforcement learning...
11/01/2019

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

We consider the exploration-exploitation dilemma in finite-horizon reinf...
02/04/2019

Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression

In experimental design, we are given a large collection of vectors, each...
08/09/2014

Bandit Algorithms for Tree Search

Bandit based methods for tree search have recently gained popularity whe...
02/02/2019

Learning Linear Dynamical Systems with Semi-Parametric Least Squares

We analyze a simple prefiltered variation of the least squares estimator...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.