Batched Thompson Sampling for Multi-Armed Bandits

08/15/2021
by   Nikolai Karpov, et al.
0

We study Thompson Sampling algorithms for stochastic multi-armed bandits in the batched setting, in which we want to minimize the regret over a sequence of arm pulls using a small number of policy changes (or, batches). We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets. We also analyze the proposed algorithms from the theoretical aspect and obtain almost tight regret-batches tradeoffs for the two-arm case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2019

Batched Multi-armed Bandits Problem

In this paper, we study the multi-armed bandit problem in the batched se...
research
04/27/2020

Learning to Rank in the Position Based Model with Bandit Feedback

Personalization is a crucial aspect of many online experiences. In parti...
research
05/30/2023

Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits

The study of collaborative multi-agent bandits has attracted significant...
research
05/27/2019

The bias of the sample mean in multi-armed bandits can be positive or negative

It is well known that in stochastic multi-armed bandits (MAB), the sampl...
research
03/29/2018

Best arm identification in multi-armed bandits with delayed feedback

We propose a generalization of the best arm identification problem in st...
research
06/18/2019

Simple Algorithms for Dueling Bandits

In this paper, we present simple algorithms for Dueling Bandits. We prov...
research
09/26/2020

Near-Optimal MNL Bandits Under Risk Criteria

We study MNL bandits, which is a variant of the traditional multi-armed ...

Please sign up or login with your details

Forgot password? Click here to reset