Odds-Ratio Thompson Sampling to Control for Time-Varying Effect

03/04/2020
by   Sulgi Kim, et al.
0

Multi-armed bandit methods have been used for dynamic experiments particularly in online services. Among the methods, thompson sampling is widely used because it is simple but shows desirable performance. Many thompson sampling methods for binary rewards use logistic model that is written in a specific parameterization. In this study, we reparameterize logistic model with odds ratio parameters. This shows that thompson sampling can be used with subset of parameters. Based on this finding, we propose a novel method, "Odds-ratio thompson sampling", which is expected to work robust to time-varying effect. Use of the proposed method in continuous experiment is described with discussing a desirable property of the method. In simulation studies, the novel method works robust to temporal background effect, while the loss of performance was only marginal in case with no such effect. Finally, using dataset from real service, we showed that the novel method would gain greater rewards in practical environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

We study the effect of persistence of engagement on learning in a stocha...
research
10/01/2021

Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits

We study the asymptotic performance of the Thompson sampling algorithm i...
research
05/18/2012

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

The question of the optimality of Thompson Sampling for solving the stoc...
research
01/03/2023

Computing the Performance of A New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

Designing experiments often requires balancing between learning about th...
research
07/30/2021

Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce

E-commerce sites strive to provide users the most timely relevant inform...
research
06/28/2021

Dynamic Planning and Learning under Recovering Rewards

Motivated by emerging applications such as live-streaming e-commerce, pr...
research
11/17/2022

Mediation analysis with case-control sampling: Identification and estimation in the presence of a binary mediator

With reference to a stratified case-control procedure based on a binary ...

Please sign up or login with your details

Forgot password? Click here to reset