Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

02/26/2020
by   Masahiro Kato, et al.
0

We consider the evaluation and training of a new policy for the evaluation data by using the historical data obtained from a different policy. The goal of off-policy evaluation (OPE) is to estimate the expected reward of a new policy over the evaluation data, and that of off-policy learning (OPL) is to find a new policy that maximizes the expected reward over the evaluation data. Although the standard OPE and OPL assume the same distribution of covariate between the historical and evaluation data, there often exists a problem of a covariate shift, i.e., the distribution of the covariate of the historical data is different from that of the evaluation data. In this paper, we derive the efficiency bound of OPE under a covariate shift. Then, we propose doubly robust and efficient estimators for OPE and OPL under a covariate shift by using an estimator of the density ratio between the distributions of the historical and evaluation data. We also discuss other possible estimators and compare their theoretical properties. Finally, we confirm the effectiveness of the proposed estimators through experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2019

Triply Robust Off-Policy Evaluation

We propose a robust regression approach to off-policy evaluation (OPE) f...
research
04/19/2023

An Offline Metric for the Debiasedness of Click Models

A well-known problem when learning from user clicks are inherent biases ...
research
07/24/2023

Safety Performance of Neural Networks in the Presence of Covariate Shift

Covariate shift may impact the operational safety performance of neural ...
research
10/08/2020

Theoretical and Experimental Comparison of Off-Policy Evaluation from Dependent Samples

We theoretically and experimentally compare estimators for off-policy ev...
research
02/06/2023

Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

Dealing with distribution shifts is one of the central challenges for mo...
research
08/18/2021

Contrastive Identification of Covariate Shift in Image Data

Identifying covariate shift is crucial for making machine learning syste...
research
10/23/2020

Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy

The goal of off-policy evaluation (OPE) is to evaluate a new policy usin...

Please sign up or login with your details

Forgot password? Click here to reset