Adaptive Estimator Selection for Off-Policy Evaluation

02/18/2020
by   Yi Su, et al.
0

We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor. Via in-depth case studies in contextual bandits and reinforcement learning, we demonstrate the generality and applicability of the method. We also perform comprehensive experiments, demonstrating the empirical efficacy of our approach and comparing with related approaches. In both case studies, our method compares favorably with existing methods.

READ FULL TEXT

page 8

page 10

research
11/25/2022

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Off-policy evaluation (OPE) aims to accurately evaluate the performance ...
research
06/13/2023

Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits

We consider policy optimization in contextual bandits, where one is give...
research
06/07/2019

Empirical Likelihood for Contextual Bandits

We apply empirical likelihood techniques to contextual bandit policy val...
research
05/16/2016

Off-policy evaluation for slate recommendation

This paper studies the evaluation of policies that recommend an ordered ...
research
01/19/2021

Minimax Off-Policy Evaluation for Multi-Armed Bandits

We study the problem of off-policy evaluation in the multi-armed bandit ...
research
08/27/2023

Distributional Off-Policy Evaluation for Slate Recommendations

Recommendation strategies are typically evaluated by using previously lo...
research
01/05/2021

Off-Policy Evaluation of Slate Policies under Bayes Risk

We study the problem of off-policy evaluation for slate bandits, for the...

Please sign up or login with your details

Forgot password? Click here to reset