Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce

07/30/2021
by   Ding Xiang, et al.
0

E-commerce sites strive to provide users the most timely relevant information in order to reduce shopping frictions and increase customer satisfaction. Multi armed bandit models (MAB) as a type of adaptive optimization algorithms provide possible approaches for such purposes. In this paper, we analyze using three classic MAB algorithms, epsilon-greedy, Thompson sampling (TS), and upper confidence bound 1 (UCB1) for dynamic content recommendations, and walk through the process of developing these algorithms internally to solve a real world e-commerce use case. First, we analyze the three MAB algorithms using simulated purchasing datasets with non-stationary reward distributions to simulate the possible time-varying customer preferences, where the traffic allocation dynamics and the accumulative rewards of different algorithms are studied. Second, we compare the accumulative rewards of the three MAB algorithms with more than 1,000 trials using actual historical A/B test datasets. We find that the larger difference between the success rates of competing recommendations the more accumulative rewards the MAB algorithms can achieve. In addition, we find that TS shows the highest average accumulative rewards under different testing scenarios. Third, we develop a batch-updated MAB algorithm to overcome the delayed reward issue in e-commerce and enable an online content optimization on our App homepage. For a state-of-the-art comparison, a real A/B test among our batch-updated MAB algorithm, a third-party MAB solution, and the default business logic are conducted. The result shows that our batch-updated MAB algorithm outperforms the counterparts and achieves 6.13 click-through rate (CTR) increase and 16.1 increase compared to the default experience, and 2.9 1.4

READ FULL TEXT

page 3

page 5

research
03/11/2018

Incentives in the Dark: Multi-armed Bandits for Evolving Users with Unknown Type

Design of incentives or recommendations to users is becoming more common...
research
10/01/2021

Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits

We study the asymptotic performance of the Thompson sampling algorithm i...
research
11/08/2017

A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem

The multi-armed bandit problem has been extensively studied under the st...
research
01/29/2021

Learning User Preferences in Non-Stationary Environments

Recommendation systems often use online collaborative filtering (CF) alg...
research
05/24/2023

An Evaluation on Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation

To speed up online testing, adaptive traffic experimentation through mul...
research
03/04/2020

Odds-Ratio Thompson Sampling to Control for Time-Varying Effect

Multi-armed bandit methods have been used for dynamic experiments partic...
research
09/09/2022

Extending Open Bandit Pipeline to Simulate Industry Challenges

Bandit algorithms are often used in the e-commerce industry to train Mac...

Please sign up or login with your details

Forgot password? Click here to reset