Sample Complexity of Variance-reduced Distributionally Robust Q-learning

05/28/2023
by   Shengbo Wang, et al.
0

Dynamic decision making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment on which the data is collected can differ from that of the environment on which the model is deployed. This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced counterpart, that can effectively learn a robust policy despite distributional shifts. These algorithms are designed to efficiently approximate the q-function of an infinite-horizon γ-discounted robust Markov decision process with Kullback-Leibler uncertainty set to an entry-wise ϵ-degree of precision. Further, the variance-reduced distributionally robust Q-learning combines the synchronous Q-learning with variance-reduction techniques to enhance its performance. Consequently, we establish that it attains a minmax sample complexity upper bound of Õ(|S||A|(1-γ)^-4ϵ^-2), where S and A denote the state and action spaces. This is the first complexity result that is independent of the uncertainty size δ, thereby providing new complexity theoretic insights. Additionally, a series of numerical experiments confirm the theoretical findings and the efficiency of the algorithms in handling distributional shifts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2023

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

Three major challenges in reinforcement learning are the complex dynamic...
research
10/30/2022

Robust Data Valuation via Variance Reduced Data Shapley

Data valuation, especially quantifying data value in algorithmic predict...
research
06/10/2020

Distributional Robust Batch Contextual Bandits

Policy learning using historical observational data is an important prob...
research
02/26/2023

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

We consider a reinforcement learning setting in which the deployment env...
research
05/26/2023

The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

This paper investigates model robustness in reinforcement learning (RL) ...
research
04/24/2023

Addressing distributional shifts in operations management: The case of order fulfillment in customized production

To meet order fulfillment targets, manufacturers seek to optimize produc...
research
05/05/2021

H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch

We present H-TD2: Hybrid Temporal Difference Learning for Taxi Dispatch,...

Please sign up or login with your details

Forgot password? Click here to reset