Bayesian Counterfactual Risk Minimization

06/29/2018
by   Ben London, et al.
0

We present a Bayesian view of counterfactual risk minimization (CRM), also known as offline policy optimization from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated IPS estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2023

Sequential Counterfactual Risk Minimization

Counterfactual Risk Minimization (CRM) is a framework for dealing with t...
research
02/09/2015

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

We develop a learning principle and an efficient algorithm for batch lea...
research
09/15/2022

Semi-Counterfactual Risk Minimization Via Neural Networks

Counterfactual risk minimization is a framework for offline policy optim...
research
06/14/2019

Distributionally Robust Counterfactual Risk Minimization

This manuscript introduces the idea of using Distributionally Robust Opt...
research
05/25/2023

Exponential Smoothing for Off-Policy Learning

Off-policy learning (OPL) aims at finding improved policies from logged ...
research
07/23/2019

Off-policy Learning for Multiple Loggers

It is well known that the historical logs are used for evaluating and le...
research
05/03/2018

Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback

Counterfactual learning from human bandit feedback describes a scenario ...

Please sign up or login with your details

Forgot password? Click here to reset