Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

by   Yang Xu, et al.

Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of the outcome. However, in a variety of applications, criteria other than the mean may be more sensible. For example, when the reward distribution is skewed and asymmetric, quantile-based metrics are often preferred for their robustness. In this paper, we propose a doubly-robust inference procedure for quantile OPE in sequential decision making and study its asymptotic properties. In particular, we propose utilizing state-of-the-art deep conditional generative learning methods to handle parameter-dependent nuisance function estimation. We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform. In particular, we find that our proposed estimator outperforms classical OPE estimators for the mean in settings with heavy-tailed reward distributions.


page 1

page 2

page 3

page 4


Causal Analysis at Extreme Quantiles with Application to London Traffic Flow Data

Treatment effects on asymmetric and heavy tailed distributions are bette...

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Off-policy evaluation (OPE) is a method for estimating the return of a t...

Universal Off-Policy Evaluation

When faced with sequential decision-making problems, it is often useful ...

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation (OPE) est...

Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing

Many modern tech companies, such as Google, Uber, and Didi, utilize onli...

Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service

Off-policy evaluation (OPE) is the method that attempts to estimate the ...

Conformal Off-Policy Prediction

Off-policy evaluation is critical in a number of applications where new ...

Please sign up or login with your details

Forgot password? Click here to reset