Off-Policy Risk Assessment in Contextual Bandits

04/18/2021
by   Audrey Huang, et al.
0

To evaluate prospective contextual bandit policies when experimentation is not possible, practitioners often rely on off-policy evaluation, using data collected under a behavioral policy. While off-policy evaluation studies typically focus on the expected return, practitioners often care about other functionals of the reward distribution (e.g., to express aversion to risk). In this paper, we first introduce the class of Lipschitz risk functionals, which subsumes many common functionals, including variance, mean-variance, and conditional value-at-risk (CVaR). For Lipschitz risk functionals, the error in off-policy risk estimation is bounded by the error in off-policy estimation of the cumulative distribution function (CDF) of rewards. Second, we propose Off-Policy Risk Assessment (OPRA), an algorithm that (i) estimates the target policy's CDF of rewards; and (ii) generates a plug-in estimate of the risk. Given a collection of Lipschitz risk functionals, OPRA provides estimates for each with corresponding error bounds that hold simultaneously. We analyze both importance sampling and variance-reduced doubly robust estimators of the CDF. Our primary theoretical contributions are (i) the first concentration inequalities for both types of CDF estimators and (ii) guarantees on our Lipschitz risk functional estimates, which converge at a rate of O(1/√(n)). For practitioners, OPRA offers a practical solution for providing high-confidence assessments of policies using a collection of relevant metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2022

Off-Policy Risk Assessment in Markov Decision Processes

Addressing such diverse ends as safety alignment with human preferences,...
research
11/06/2021

SOPE: Spectrum of Off-Policy Estimators

Many sequential decision making problems are high-stakes and require off...
research
01/25/2021

High-Confidence Off-Policy (or Counterfactual) Variance Estimation

Many sequential decision-making systems leverage data collected using pr...
research
05/19/2023

Off-policy evaluation beyond overlap: partial identification through smoothness

Off-policy evaluation (OPE) is the problem of estimating the value of a ...
research
10/29/2020

Off-Policy Interval Estimation with Lipschitz Value Iteration

Off-policy evaluation provides an essential tool for evaluating the effe...
research
01/05/2021

Off-Policy Evaluation of Slate Policies under Bayes Risk

We study the problem of off-policy evaluation for slate bandits, for the...
research
09/06/2022

A Data Science Approach to Risk Assessment for Automobile Insurance Policies

In order to determine a suitable automobile insurance policy premium one...

Please sign up or login with your details

Forgot password? Click here to reset