Regional Multi-Armed Bandits

02/22/2018
by   Zhiyang Wang, et al.
0

We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when the player selects an arm at each time slot, information of other arms in the same group is also revealed. This regional bandit model naturally bridges the non-informative bandit setting where the player can only learn the chosen arm, and the global bandit model where sampling one arms reveals information of all arms. We propose an efficient algorithm, UCB-g, that solves the regional bandit problem by combining the Upper Confidence Bound (UCB) and greedy principles. Both parameter-dependent and parameter-free regret upper bounds are derived. We also establish a matching lower bound, which proves the order-optimality of UCB-g. Moreover, we propose SW-UCB-g, which is an extension of UCB-g for a non-stationary environment where the parameters slowly vary over time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2018

Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms

We analyze the regret of combinatorial Thompson sampling (CTS) for the c...
research
10/29/2020

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

In many real-world applications, multiple agents seek to learn how to pe...
research
07/14/2020

Generic Outlier Detection in Multi-Armed Bandit

In this paper, we study the problem of outlier arm detection in multi-ar...
research
05/20/2022

Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

We study a novel multi-armed bandit (MAB) setting which mandates the age...
research
02/01/2021

Generalized non-stationary bandits

In this paper, we study a non-stationary stochastic bandit problem, whic...
research
08/28/2023

Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted Averages

The multi-armed bandit (MAB) problem is a classical problem that models ...
research
01/31/2021

Quantifying the Global Support Network for Non-State Armed Groups (NAGs)

Human history has been shaped by armed conflicts. Rather than large-scal...

Please sign up or login with your details

Forgot password? Click here to reset