Rotting Bandits

02/23/2017
by   Nir Levine, et al.
0

The Multi-Armed Bandits (MAB) framework highlights the tension between acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation). In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward. The decision maker's objective is to maximize her cumulative expected reward over the time horizon. The MAB problem has been studied extensively, specifically under the assumption of the arms' rewards distributions being stationary, or quasi-stationary, over time. We consider a variant of the MAB framework, which we termed Rotting Bandits, where each arm's expected reward decays as a function of the number of times it has been pulled. We are motivated by many real-world scenarios such as online advertising, content recommendation, crowdsourcing, and more. We present algorithms, accompanied by simulations, and derive theoretical guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2018

Rotting bandits are no harder than stochastic ones

In bandits, arms' distributions are stationary. This is often violated i...
research
05/13/2014

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

In a multi-armed bandit (MAB) problem a gambler needs to choose at each ...
research
10/16/2012

Leveraging Side Observations in Stochastic Bandits

This paper considers stochastic bandits with side observations, a model ...
research
03/06/2020

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

We consider a sequential decision-making problem where an agent can take...
research
12/08/2017

On Adaptive Estimation for Dynamic Bernoulli Bandits

The multi-armed bandit (MAB) problem is a classic example of the explora...
research
02/02/2023

Stochastic Contextual Bandits with Long Horizon Rewards

The growing interest in complex decision-making and language modeling pr...
research
08/19/2022

Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits

We study the Improving Multi-Armed Bandit (IMAB) problem, where the rewa...

Please sign up or login with your details

Forgot password? Click here to reset