Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

05/20/2014
by   Richard Combes, et al.
0

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which case arms belong to a bounded interval. For discrete unimodal bandits, we derive asymptotic lower bounds for the regret achieved under any algorithm, and propose OSUB, an algorithm whose regret matches this lower bound. Our algorithm optimally exploits the unimodal structure of the problem, and surprisingly, its asymptotic regret does not depend on the number of arms. We also provide a regret upper bound for OSUB in non-stationary environments where the expected rewards smoothly evolve over time. The analytical results are supported by numerical experiments showing that OSUB performs significantly better than the state-of-the-art algorithms. For continuous sets of arms, we provide a brief discussion. We show that combining an appropriate discretization of the set of arms with the UCB algorithm yields an order-optimal regret, and in practice, outperforms recently proposed algorithms designed to exploit the unimodal structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Dynamic Bandits with an Auto-Regressive Temporal Structure

Multi-armed bandit (MAB) problems are mainly studied under two extreme s...
research
06/06/2020

Contextual Bandits with Side-Observations

We investigate contextual bandits in the presence of side-observations a...
research
02/08/2016

Decoy Bandits Dueling on a Poset

We adress the problem of dueling bandits defined on partially ordered se...
research
06/15/2021

Thompson Sampling for Unimodal Bandits

In this paper, we propose a Thompson Sampling algorithm for unimodal ban...
research
11/27/2018

Rotting bandits are no harder than stochastic ones

In bandits, arms' distributions are stationary. This is often violated i...
research
04/02/2020

Predictive Bandits

We introduce and study a new class of stochastic bandit problems, referr...
research
07/04/2019

Reducing Exploration of Dying Arms in Mortal Bandits

Mortal bandits have proven to be extremely useful for providing news art...

Please sign up or login with your details

Forgot password? Click here to reset