Parallelizing Thompson Sampling

06/02/2021
by   Amin Karbasi, et al.
0

How can we make use of information parallelism in online decision making problems while efficiently balancing the exploration-exploitation trade-off? In this paper, we introduce a batch Thompson Sampling framework for two canonical online decision making problems, namely, stochastic multi-arm bandit and linear contextual bandit with finitely many arms. Over a time horizon T, our batch Thompson Sampling policy achieves the same (asymptotic) regret bound of a fully sequential one while carrying out only O(log T) batch queries. To achieve this exponential reduction, i.e., reducing the number of interactions from T to O(log T), our batch policy dynamically determines the duration of each batch in order to balance the exploration-exploitation trade-off. We also demonstrate experimentally that dynamic batch allocation dramatically outperforms natural baselines such as static batch allocations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

Batched Neural Bandits

In many sequential decision-making problems, the individuals are split i...
research
04/14/2020

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

We study the sequential batch learning problem in linear contextual band...
research
02/21/2020

Online Batch Decision-Making with High-Dimensional Covariates

We propose and investigate a class of new algorithms for sequential deci...
research
01/21/2019

Parallel Contextual Bandits in Wireless Handover Optimization

As cellular networks become denser, a scalable and dynamic tuning of wir...
research
07/20/2020

A Hierarchical Approach to Scaling Batch Active Search Over Structured Data

Active search is the process of identifying high-value data points in a ...
research
11/21/2018

Efficient nonmyopic active search with applications in drug and materials discovery

Active search is a learning paradigm for actively identifying as many me...
research
10/14/2020

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Online decision-making problem requires us to make a sequence of decisio...

Please sign up or login with your details

Forgot password? Click here to reset