Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

02/28/2022
by   Laixi Shi, et al.
6

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and sample scarcity of many offline datasets, the principle of pessimism has been recently introduced to mitigate high bias of the estimated values. While pessimistic variants of model-based algorithms (e.g., value iteration with lower confidence bounds) have been theoretically investigated, their model-free counterparts – which do not require explicit model estimation – have not been adequately studied, especially in terms of sample efficiency. To address this inadequacy, we study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes, and characterize its sample complexity under the single-policy concentrability assumption which does not require the full coverage of the state-action space. In addition, a variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity. Altogether, this work highlights the efficiency of model-free algorithms in offline RL when used in conjunction with pessimism and variance reduction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2022

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

This paper concerns the central issues of model robustness and sample ef...
research
03/14/2022

The Efficacy of Pessimism in Asynchronous Q-Learning

This paper is concerned with the asynchronous form of Q-learning, which ...
research
03/24/2021

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Policy optimization methods are popular reinforcement learning algorithm...
research
07/13/2021

Pessimistic Model-based Offline RL: PAC Bounds and Posterior Sampling under Partial Coverage

We study model-based offline Reinforcement Learning with general functio...
research
03/11/2022

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historica...
research
10/09/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Achieving sample efficiency in online episodic reinforcement learning (R...
research
05/22/2023

Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning

Offline reinforcement learning aims to find the optimal policy from a pr...

Please sign up or login with your details

Forgot password? Click here to reset