Thompson Sampling for (Combinatorial) Pure Exploration

06/18/2022
by   Siwei Wang, et al.
0

Existing methods of combinatorial pure exploration mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds within arm set S to represent the upper confidence bound of S, which can be much larger than the tight upper confidence bound of S and leads to a much higher complexity than necessary, since the empirical means of different arms in S are independent. To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design the first TS-based algorithm TS-Explore for (combinatorial) pure exploration. In TS-Explore, the sum of independent random samples within arm set S will not exceed the tight upper confidence bound of S with high probability. Hence it solves the above challenge, and achieves a lower complexity upper bound than existing efficient UCB-based algorithms in general combinatorial pure exploration. As for pure exploration of classic multi-armed bandit, we show that TS-Explore achieves an asymptotically optimal complexity upper bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2023

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

We study the real-valued combinatorial pure exploration of the multi-arm...
research
06/15/2023

Combinatorial Pure Exploration of Multi-Armed Bandit with a Real Number Action Class

The combinatorial pure exploration (CPE) in the stochastic multi-armed b...
research
06/25/2019

Non-Asymptotic Pure Exploration by Solving Games

Pure exploration (aka active testing) is the fundamental task of sequent...
research
06/04/2020

Differentiable Linear Bandit Algorithm

Upper Confidence Bound (UCB) is arguably the most commonly used method f...
research
02/27/2019

Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions

In this paper, we propose and evaluate different learning strategies bas...
research
12/02/2020

Instance-Sensitive Algorithms for Pure Exploration in Multinomial Logit Bandit

Motivated by real-world applications such as fast fashion retailing and ...
research
11/21/2017

Disagreement-based combinatorial pure exploration: Efficient algorithms and an analysis with localization

We design new algorithms for the combinatorial pure exploration problem ...

Please sign up or login with your details

Forgot password? Click here to reset