Optimal Algorithms for Latent Bandits with Cluster Structure

01/17/2023
by   Soumyabrata Pal, et al.
4

We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into latent clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late <cit.>. Now, if each user acts independently, then they would have to explore each arm independently and a regret of Ω(√(𝖬𝖭𝖳)) is unavoidable, where 𝖬, 𝖭 are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the latent cluster structure to provide the minimax optimal regret of O(√((𝖬+𝖭)𝖳)), when the number of clusters is O(1). This is the first algorithm to guarantee such a strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only O(log𝖳) calls to an offline matrix completion oracle across all 𝖳 rounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2020

Multi-Armed Bandits with Dependent Arms

We study a variant of the classical multi-armed bandit problem (MABP) wh...
research
09/08/2022

Online Low Rank Matrix Completion

We study the problem of online low-rank matrix completion with 𝖬 users, ...
research
06/06/2021

Multi-facet Contextual Bandits: A Neural Network Perspective

Contextual multi-armed bandit has shown to be an effective tool in recom...
research
07/27/2019

Blocking Bandits

We consider a novel stochastic multi-armed bandit setting, where playing...
research
04/08/2021

Incentivizing Exploration in Linear Bandits under Information Gap

We study the problem of incentivizing exploration for myopic users in li...
research
06/15/2021

Collaborative Learning and Personalization in Multi-Agent Stochastic Linear Bandits

We consider the problem of minimizing regret in an N agent heterogeneous...
research
06/01/2016

Contextual Bandits with Latent Confounders: An NMF Approach

Motivated by online recommendation and advertising systems, we consider ...

Please sign up or login with your details

Forgot password? Click here to reset