Online Sub-Sampling for Reinforcement Learning with General Function Approximation

06/14/2021
by   Dingwen Kong, et al.
0

Designing provably efficient algorithms with general function approximation is an important open problem in reinforcement learning. Recently, Wang et al. [2020c] establish a value-based algorithm with general function approximation that enjoys O(poly(dH)√(K))[Throughout the paper, we use O(·) to suppress logarithm factors. ] regret bound, where d depends on the complexity of the function class, H is the planning horizon, and K is the total number of episodes. However, their algorithm requires Ω(K) computation time per round, rendering the algorithm inefficient for practical use. In this paper, by applying online sub-sampling techniques, we develop an algorithm that takes O(poly(dH)) computation time per round on average, and enjoys nearly the same regret bound. Furthermore, the algorithm achieves low switching cost, i.e., it changes the policy only O(poly(dH)) times during its execution, making it appealing to be implemented in real-life scenarios. Moreover, by using an upper-confidence based exploration-driven reward function, the algorithm provably explores the environment in the reward-free setting. In particular, after O(poly(dH))/ϵ^2 rounds of exploration, the algorithm outputs an ϵ-optimal policy for any given reward function.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

We propose a model-free reinforcement learning algorithm inspired by the...
research
02/22/2023

Provably Efficient Reinforcement Learning via Surprise Bound

Value function approximation is important in modern reinforcement learni...
research
06/11/2020

Adaptive Reward-Free Exploration

Reward-free exploration is a reinforcement learning setting recently stu...
research
06/16/2020

Q-learning with Logarithmic Regret

This paper presents the first non-asymptotic result showing that a model...
research
06/15/2023

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

Policy optimization methods are powerful algorithms in Reinforcement Lea...
research
06/19/2023

Least Square Value Iteration is Robust Under Locally Bounded Misspecification Error

The success of reinforcement learning heavily relies on the function app...
research
03/24/2022

Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies

This paper gives the first polynomial-time algorithm for tabular Markov ...

Please sign up or login with your details

Forgot password? Click here to reset