Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

02/19/2022
by   Nathan Kallus, et al.
0

Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where experimentation is necessarily limited. OPE/L is nonetheless sensitive to discrepancies between the data-generating environment and that where policies are deployed. Recent work proposed distributionally robust OPE/L (DROPE/L) to remedy this, but the proposal relies on inverse-propensity weighting, whose regret rates may deteriorate if propensities are estimated and whose variance is suboptimal even if not. For vanilla OPE/L, this is solved by doubly robust (DR) methods, but they do not naturally extend to the more complex DROPE/L, which involves a worst-case expectation. In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. For evaluation, we propose Localized Doubly Robust DROPE (LDR^2OPE) and prove its semiparametric efficiency under weak product rates conditions. Notably, thanks to a localization technique, LDR^2OPE only requires fitting a small number of regressions, just like DR methods for vanilla OPE. For learning, we propose Continuum Doubly Robust DROPL (CDR^2OPL) and show that, under a product rate condition involving a continuum of regressions, it enjoys a fast regret rate of 𝒪(N^-1/2) even when unknown propensities are nonparametrically estimated. We further extend our results to general f-divergence uncertainty sets. We illustrate the advantage of our algorithms in simulations.

READ FULL TEXT
research
02/23/2019

Distributionally Robust Reinforcement Learning

Generalization to unknown/uncertain environments of reinforcement learni...
research
12/18/2021

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

We consider the off-policy evaluation (OPE) problem in contextual bandit...
research
09/15/2023

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Off-policy evaluation and learning are concerned with assessing a given ...
research
06/26/2020

Learning Optimal Distributionally Robust Individualized Treatment Rules

Recent development in the data-driven decision science has seen great ad...
research
05/21/2017

Balanced Policy Evaluation and Learning

We present a new approach to the problems of evaluating and learning per...
research
08/29/2022

DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs

In this paper, we propose to solve a regularized distributionally robust...
research
03/23/2023

Federated Uncertainty-Aware Aggregation for Fundus Diabetic Retinopathy Staging

Deep learning models have shown promising performance in the field of di...

Please sign up or login with your details

Forgot password? Click here to reset