Rotting infinitely many-armed bandits

01/31/2022
by   Jung-Hun Kim, et al.
3

We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate ϱ=o(1). We show that this learning problem has an Ω(max{ϱ^1/3T,√(T)}) worst-case regret lower bound where T is the horizon time. We show that a matching upper bound Õ(max{ϱ^1/3T,√(T)}), up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate ϱ. We also show that an Õ(max{ϱ^1/3T,T^3/4}) regret upper bound can be achieved by an algorithm that does not know the value of ϱ, by using an adaptive UCB index along with an adaptive threshold value.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback

We consider a combinatorial multi-armed bandit problem for maximum value...
research
03/29/2022

Best Arm Identification in Restless Markov Multi-Armed Bandits

We study the problem of identifying the best arm in a multi-armed bandit...
research
04/07/2012

UCB Algorithm for Exponential Distributions

We introduce in this paper a new algorithm for Multi-Armed Bandit (MAB) ...
research
02/12/2018

Multi-Armed Bandits on Unit Interval Graphs

An online learning problem with side information on the similarity and d...
research
11/25/2021

Bandit problems with fidelity rewards

The fidelity bandits problem is a variant of the K-armed bandit problem ...
research
04/09/2019

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

This note gives a short, self-contained, proof of a sharp connection bet...
research
03/31/2021

Robust Experimentation in the Continuous Time Bandit Problem

We study the experimentation dynamics of a decision maker (DM) in a two-...

Please sign up or login with your details

Forgot password? Click here to reset