A Large-scale Open Dataset for Bandit Algorithms

08/17/2020
by   Yuta Saito, et al.
0

We build and publicize the Open Bandit Dataset and Pipeline to facilitate scalable and reproducible research on bandit algorithms. They are especially suitable for off-policy evaluation (OPE), which attempts to predict the performance of hypothetical algorithms using data generated by a different algorithm. We construct the dataset based on experiments and implementations on a large-scale fashion e-commerce platform, ZOZOTOWN. The data contain the ground-truth about the performance of several bandit policies and enable the fair comparisons of different OPE estimators. We also provide a pipeline to make its implementation easy and consistent. As a proof of concept, we use the dataset and pipeline to implement and evaluate OPE estimators. First, we find that a well-established estimator fails, suggesting that it is critical to choose an appropriate estimator. We then select a well-performing estimator and use it to improve the platform's fashion item recommendation. Our analysis succeeds in finding a counterfactual policy that significantly outperforms the historical ones. Our open data and pipeline will allow researchers and practitioners to easily evaluate and compare their bandit algorithms and OPE estimators with others in a large, real-world setting.

READ FULL TEXT

page 4

page 6

research
09/10/2018

Efficient Counterfactual Learning from Bandit Feedback

What is the most statistically efficient way to do off-policy evaluation...
research
09/09/2022

Extending Open Bandit Pipeline to Simulate Industry Challenges

Bandit algorithms are often used in the e-commerce industry to train Mac...
research
09/17/2021

Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service

Off-policy evaluation (OPE) is the method that attempts to estimate the ...
research
08/31/2021

Evaluating the Robustness of Off-Policy Evaluation

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates...
research
10/08/2020

Theoretical and Experimental Comparison of Off-Policy Evaluation from Dependent Samples

We theoretically and experimentally compare estimators for off-policy ev...
research
06/12/2020

Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales

This study addresses the problem of off-policy evaluation (OPE) from dep...
research
02/20/2020

Safe Counterfactual Reinforcement Learning

We develop a method for predicting the performance of reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset