Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

05/14/2023
by   Yuta Saito, et al.
0

We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect. OffCEM applies importance weighting only to action clusters and addresses the residual causal effect through model-based reward estimation. We show that the proposed estimator is unbiased under a new condition, called local correctness, which only requires that the residual-effect model preserves the relative expected reward differences of the actions within each cluster. To best leverage the CEM and local correctness, we also propose a new two-step procedure for performing model-based estimation that minimizes bias in the first step and variance in the second step. We find that the resulting OffCEM estimator substantially improves bias and variance compared to a range of conventional estimators. Experiments demonstrate that OffCEM provides substantial improvements in OPE especially in the presence of many actions.

READ FULL TEXT

page 7

page 8

page 9

page 23

page 24

page 25

page 26

research
08/07/2023

Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces

We study Off-Policy Evaluation (OPE) in contextual bandit settings with ...
research
02/13/2022

Off-Policy Evaluation for Large Action Spaces via Embeddings

Off-policy evaluation (OPE) in contextual bandits has seen rapid adoptio...
research
11/06/2018

CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning

The ability to perform offline A/B-testing and off-policy learning using...
research
07/13/2023

Leveraging Factored Action Spaces for Off-Policy Evaluation

Off-policy evaluation (OPE) aims to estimate the benefit of following a ...
research
07/22/2019

Doubly robust off-policy evaluation with shrinkage

We design a new family of estimators for off-policy evaluation in contex...
research
05/06/2023

Learning Action Embeddings for Off-Policy Evaluation

Off-policy evaluation (OPE) methods allow us to compute the expected rew...
research
06/06/2022

Markovian Interference in Experiments

We consider experiments in dynamical systems where interventions on some...

Please sign up or login with your details

Forgot password? Click here to reset