Thompson Sampling for Unimodal Bandits

06/15/2021
by   Long Yang, et al.
0

In this paper, we propose a Thompson Sampling algorithm for unimodal bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, our algorithm makes decision according to posterior distribution only in the neighborhood of the arm that has the highest empirical mean estimate. We theoretically prove that, for Bernoulli rewards, the regret of our algorithm reaches the lower bound of unimodal bandits, thus it is asymptotically optimal. For Gaussian rewards, the regret of our algorithm is 𝒪(log T), which is far better than standard Thompson Sampling algorithms. Extensive experiments demonstrate the effectiveness of the proposed algorithm on both synthetic data sets and the real-world applications.

READ FULL TEXT
research
04/25/2016

Double Thompson Sampling for Dueling Bandits

In this paper, we propose a Double Thompson Sampling (D-TS) algorithm fo...
research
12/10/2020

Thompson Sampling for CVaR Bandits

Risk awareness is an important feature to formulate a variety of real wo...
research
05/20/2014

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is ...
research
06/21/2019

Randomized Exploration in Generalized Linear Bandits

We study two randomized algorithms for generalized linear bandits, GLM-T...
research
05/18/2018

PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

We address the problem of regret minimization in logistic contextual ban...
research
02/03/2022

Deep Hierarchy in Bandits

Mean rewards of actions are often correlated. The form of these correlat...
research
06/08/2022

Neural Bandit with Arm Group Graph

Contextual bandits aim to identify among a set of arms the optimal one w...

Please sign up or login with your details

Forgot password? Click here to reset