Provably Efficient Reinforcement Learning via Surprise Bound

02/22/2023
by   Hanlin Zhu, et al.
0

Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large. Despite the importance and wide applicability of value function approximation, its theoretical understanding is still not as sophisticated as its empirical success, especially in the context of general function approximation. In this paper, we propose a provably efficient RL algorithm (both computationally and statistically) with general value function approximations. We show that if the value functions can be approximated by a function class that satisfies the Bellman-completeness assumption, our algorithm achieves an O(poly(ι H)√(T)) regret bound where ι is the product of the surprise bound and log-covering numbers, H is the planning horizon, K is the number of episodes and T = HK is the total number of steps the agent interacts with the environment. Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting. Moreover, our algorithm only needs to solve O(Hlog K) empirical risk minimization (ERM) problems, which is far more efficient than previous algorithms that need to solve ERM problems for Ω(HK) times.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2020

Provably Efficient Reinforcement Learning with General Value Function Approximation

Value function approximation has demonstrated phenomenal empirical succe...
research
05/23/2022

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

We study human-in-the-loop reinforcement learning (RL) with trajectory p...
research
06/14/2021

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Designing provably efficient algorithms with general function approximat...
research
07/06/2023

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

Risk-sensitive reinforcement learning (RL) aims to optimize policies tha...
research
03/25/2021

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

This paper considers batch Reinforcement Learning (RL) with general valu...
research
02/24/2023

Provably Efficient Neural Offline Reinforcement Learning via Perturbed Rewards

We propose a novel offline reinforcement learning (RL) algorithm, namely...
research
02/07/2023

Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability

Goal-conditioned reinforcement learning (GCRL) refers to learning genera...

Please sign up or login with your details

Forgot password? Click here to reset