Speed Up the Cold-Start Learning in Two-Sided Bandits with Many Arms

10/01/2022
by   Mohsen Bayati, et al.
0

Multi-armed bandit (MAB) algorithms are efficient approaches to reduce the opportunity cost of online experimentation and are used by companies to find the best product from periodically refreshed product catalogs. However, these algorithms face the so-called cold-start at the onset of the experiment due to a lack of knowledge of customer preferences for new products, requiring an initial data collection phase known as the burning period. During this period, MAB algorithms operate like randomized experiments, incurring large burning costs which scale with the large number of products. We attempt to reduce the burning by identifying that many products can be cast into two-sided products, and then naturally model the rewards of the products with a matrix, whose rows and columns represent the two sides respectively. Next, we design two-phase bandit algorithms that first use subsampling and low-rank matrix estimation to obtain a substantially smaller targeted set of products and then apply a UCB procedure on the target products to find the best one. We theoretically show that the proposed algorithms lower costs and expedite the experiment in cases when there is limited experimentation time along with a large product set. Our analysis also reveals three regimes of long, short, and ultra-short horizon experiments, depending on dimensions of the matrix. Empirical evidence from both synthetic data and a real-world dataset on music streaming services validates this superior performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2019

Conservative Exploration for Semi-Bandits with Linear Generalization: A Product Selection Problem for Urban Warehouses

The recent rising popularity of ultra-fast delivery services on retail p...
research
12/19/2022

Matrix recovery from matrix-vector products

Can one recover a matrix efficiently from only matrix-vector products? I...
research
12/01/2013

Stochastic continuum armed bandit problem of few linear parameters in high dimensions

We consider a stochastic continuum armed bandit problem where the arms a...
research
04/21/2020

Algorithms for slate bandits with non-separable reward functions

In this paper, we study a slate bandit problem where the function that d...
research
03/08/2021

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Restless Multi-Armed Bandits (RMABs) have been popularly used to model l...
research
04/27/2020

Learning to Rank in the Position Based Model with Bandit Feedback

Personalization is a crucial aspect of many online experiences. In parti...
research
02/02/2022

Adaptive Experimentation with Delayed Binary Feedback

Conducting experiments with objectives that take significant delays to m...

Please sign up or login with your details

Forgot password? Click here to reset