Randomized Exploration for Reinforcement Learning with General Value Function Approximation

06/15/2021
by   Haque Ishfaq, et al.
0

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCB-style bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class ℱ, our algorithm achieves a worst-case regret bound of O(poly(d_EH)√(T)) where T is the time elapsed, H is the planning horizon and d_E is the eluder dimension of ℱ. In the linear setting, our algorithm reduces to LSVI-PHE, a variant of RLSVI, that enjoys an 𝒪(√(d^3H^3T)) regret. We complement the theory with an empirical evaluation across known difficult exploration tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2019

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

We consider the exploration-exploitation dilemma in finite-horizon reinf...
research
06/14/2021

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Designing provably efficient algorithms with general function approximat...
research
06/07/2019

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

This paper studies a recent proposal to use randomized value functions t...
research
12/23/2019

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

It is well known that quantifying uncertainty in the action-value estima...
research
04/06/2019

Randomised Bayesian Least-Squares Policy Iteration

We introduce Bayesian least-squares policy iteration (BLSPI), an off-pol...
research
01/09/2023

Exploration in Model-based Reinforcement Learning with Randomized Reward

Model-based Reinforcement Learning (MBRL) has been widely adapted due to...
research
06/19/2023

Least Square Value Iteration is Robust Under Locally Bounded Misspecification Error

The success of reinforcement learning heavily relies on the function app...

Please sign up or login with your details

Forgot password? Click here to reset