Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

07/03/2018
by   Aniruddh Raghu, et al.
0

In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of empirical studies, we demonstrate how accurate OPE is strongly dependent on the calibration of estimated behaviour policy models: how precisely the behaviour policy is estimated from data. We show how powerful parametric models such as neural networks can result in highly uncalibrated behaviour policy models on a real-world medical dataset, and illustrate how a simple, non-parametric, k-nearest neighbours model produces better calibrated behaviour policy estimates and can be used to obtain superior importance sampling-based OPE estimates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

In reinforcement learning, off-policy evaluation is the task of using da...
research
05/14/2019

Combining Parametric and Nonparametric Models for Off-Policy Evaluation

We consider a model-based approach to perform batch off-policy evaluatio...
research
11/22/2021

Case-based off-policy policy evaluation using prototype learning

Importance sampling (IS) is often used to perform off-policy policy eval...
research
10/27/2014

The probatilistic Quantifier Fuzzification Mechanism FA: A theoretical analysis

The main goal of this work is to analyze the behaviour of the FA quantif...
research
07/04/2018

Empirical fixed point bifurcation analysis

In a common experimental setting, the behaviour of a noisy dynamical sys...
research
02/07/2022

Bayesian calibration of simulation models: A tutorial and an Australian smoking behaviour model

Simulation models of epidemiological, biological, ecological, and enviro...
research
02/12/2020

Efficient Policy Learning from Surrogate-Loss Classification Reductions

Recent work on policy learning from observational data has highlighted t...

Please sign up or login with your details

Forgot password? Click here to reset