Off-policy Learning for Multiple Loggers

07/23/2019
by   Li He, et al.
0

It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning usually harms user experiences, it is more crucial to apply off-policy learning in real-world applications instead. Though there have been some existing works, most are focusing on learning with one single historical policy. However, in practice, usually a number of parallel experiments, e.g. multiple AB tests, are performed simultaneously. To make full use of such historical data, learning policies from multiple loggers becomes necessary. Motivated by this, in this paper, we investigate off-policy learning when the training data coming from multiple historical policies. Specifically, policies, e.g. neural networks, can be learned directly from multi-logger data, with counterfactual estimators. In order to understand the generalization ability of such estimator better, we conduct generalization error analysis for the empirical risk minimization problem. We then introduce the generalization error bound as the new risk function, which can be reduced to a constrained optimization problem. Finally, we give the corresponding learning algorithm for the new constrained problem, where we can appeal to the minimax problems to control the constraints. Extensive experiments on benchmark datasets demonstrate that the proposed methods achieve better performances than the state-of-the-arts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2021

Offline Policy Comparison under Limited Historical Agent-Environment Interactions

We address the challenge of policy evaluation in real-world applications...
research
12/21/2020

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently require...
research
06/29/2018

Bayesian Counterfactual Risk Minimization

We present a Bayesian view of counterfactual risk minimization (CRM), al...
research
06/10/2022

Adversarial Counterfactual Environment Model Learning

A good model for action-effect prediction, named environment model, is i...
research
01/22/2018

Offline A/B testing for Recommender Systems

Before A/B testing online a new version of a recommender system, it is u...
research
02/23/2023

Sequential Counterfactual Risk Minimization

Counterfactual Risk Minimization (CRM) is a framework for dealing with t...
research
04/22/2020

Optimization Approaches for Counterfactual Risk Minimization with Continuous Actions

Counterfactual reasoning from logged data has become increasingly import...

Please sign up or login with your details

Forgot password? Click here to reset