Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation

11/28/2021
by   Ramtin Keramati, et al.
0

Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy. However, a new decision policy may be better than a baseline policy for some individuals but not others. This has motivated a push towards personalization and accurate per-state estimates of heterogeneous treatment effects (HTEs). Given the limited data present in many important applications, individual predictions can come at a cost to accuracy and confidence in such predictions. We develop a method to balance the need for personalization with confident predictions by identifying subgroups where it is possible to confidently estimate the expected difference in a new decision policy relative to a baseline. We propose a novel loss function that accounts for uncertainty during the subgroup partitioning phase. In experiments, we show that our method can be used to form accurate predictions of HTEs where other methods struggle.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2020

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation (OPE) est...
research
02/27/2019

Evaluation of a length-based method to estimate discard rate and the effect of sampling size

The common fisheries policy aims at eliminating discarding which has bee...
research
07/13/2016

Safe Policy Improvement by Minimizing Robust Baseline Regret

An important problem in sequential decision-making under uncertainty is ...
research
01/25/2021

High-Confidence Off-Policy (or Counterfactual) Variance Estimation

Many sequential decision-making systems leverage data collected using pr...
research
04/26/2021

Universal Off-Policy Evaluation

When faced with sequential decision-making problems, it is often useful ...
research
01/20/2022

Generalizing Off-Policy Evaluation From a Causal Perspective For Sequential Decision-Making

Assessing the effects of a policy based on observational data from a dif...
research
02/10/2020

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Off-policy evaluation in reinforcement learning offers the chance of usi...

Please sign up or login with your details

Forgot password? Click here to reset