Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

12/29/2022
by   Yang Xu, et al.
0

Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of the outcome. However, in a variety of applications, criteria other than the mean may be more sensible. For example, when the reward distribution is skewed and asymmetric, quantile-based metrics are often preferred for their robustness. In this paper, we propose a doubly-robust inference procedure for quantile OPE in sequential decision making and study its asymptotic properties. In particular, we propose utilizing state-of-the-art deep conditional generative learning methods to handle parameter-dependent nuisance function estimation. We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform. In particular, we find that our proposed estimator outperforms classical OPE estimators for the mean in settings with heavy-tailed reward distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2021

Causal Analysis at Extreme Quantiles with Application to London Traffic Flow Data

Treatment effects on asymmetric and heavy tailed distributions are bette...
research
12/29/2022

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Off-policy evaluation (OPE) is a method for estimating the return of a t...
research
04/26/2021

Universal Off-Policy Evaluation

When faced with sequential decision-making problems, it is often useful ...
research
11/08/2020

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation (OPE) est...
research
05/17/2023

Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing

Many modern tech companies, such as Google, Uber, and Didi, utilize onli...
research
09/17/2021

Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service

Off-policy evaluation (OPE) is the method that attempts to estimate the ...
research
06/14/2022

Conformal Off-Policy Prediction

Off-policy evaluation is critical in a number of applications where new ...

Please sign up or login with your details

Forgot password? Click here to reset