Leveraging Factored Action Spaces for Off-Policy Evaluation

07/13/2023
by   Aaman Rebello, et al.
0

Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2023

Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces

We study Off-Policy Evaluation (OPE) in contextual bandit settings with ...
research
03/05/2022

Off-Policy Evaluation in Embedded Spaces

Off-policy evaluation methods are important in recommendation systems an...
research
11/06/2021

SOPE: Spectrum of Off-Policy Estimators

Many sequential decision making problems are high-stakes and require off...
research
05/06/2023

Learning Action Embeddings for Off-Policy Evaluation

Off-policy evaluation (OPE) methods allow us to compute the expected rew...
research
05/14/2023

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

We study off-policy evaluation (OPE) of contextual bandit policies for l...
research
05/02/2023

Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare

Many reinforcement learning (RL) applications have combinatorial action ...
research
10/24/2022

Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions

We consider local kernel metric learning for off-policy evaluation (OPE)...

Please sign up or login with your details

Forgot password? Click here to reset