Provable benefits of general coverage conditions in efficient online RL with function approximation

04/25/2023
by   Fanghui Liu, et al.
0

In online reinforcement learning (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition (original from offline RL) is enough to ensure sample-efficient guarantees (Xie et al. 2023). In this work, we focus on this new direction by digging more possible and general coverage conditions, and study the potential and the utility of them in efficient online RL. We identify more concepts, including the L^p variant of concentrability, the density ratio realizability, and trade-off on the partial/rest coverage condition, that can be also beneficial to sample-efficient online RL, achieving improved regret bound. Furthermore, if exploratory offline data are used, under our coverage conditions, both statistically and computationally efficient guarantees can be achieved for online RL. Besides, even though the MDP structure is given, e.g., linear MDP, we elucidate that, good coverage conditions are still beneficial to obtain faster regret bound beyond O(√(T)) and even a logarithmic order regret. These results provide a good justification for the usage of general coverage conditions in efficient online RL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2022

The Role of Coverage in Online Reinforcement Learning

Coverage conditions – which assert that the data logging distribution ad...
research
12/15/2020

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation ...
research
06/15/2021

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Reinforcement learning (RL) is empirically successful in complex nonline...
research
02/09/2021

RL for Latent MDPs: Regret Guarantees and a Lower Bound

In this work, we consider the regret minimization problem for reinforcem...
research
02/05/2023

Refined Value-Based Offline RL under Realizability and Partial Coverage

In offline reinforcement learning (RL) we have no opportunity to explore...
research
11/01/2022

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Offline reinforcement learning (RL), which refers to decision-making fro...
research
12/28/2022

Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation

Offline reinforcement learning (RL) concerns pursuing an optimal policy ...

Please sign up or login with your details

Forgot password? Click here to reset