Online Multi-Armed Bandit

07/17/2017
by   Uma Roy, et al.
0

We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In this online context, we study Bernoulli bandits (bandits with payout Ber(p_i) for some underlying mean p_i) with underlying means drawn i.i.d. from various distributions, including the uniform distribution, and in general, all distributions that have a CDF satisfying certain differentiability conditions near zero. In all cases, we suggest several strategies and investigate their expected performance. Furthermore, we bound the performance of any optimal strategy and show that the strategies we have suggested are indeed optimal up to a constant factor. We also investigate the case where the distribution from which the underlying means are drawn is not known ahead of time. We again, are able to suggest algorithms that are optimal up to a constant factor for this case, given certain mild conditions on the universe of distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2023

Optimal Activation of Halting Multi-Armed Bandit Models

We study new types of dynamic allocation problems the Halting Bandit mod...
research
03/13/2015

Interactive Restless Multi-armed Bandit Game and Swarm Intelligence Effect

We obtain the conditions for the emergence of the swarm intelligence eff...
research
10/13/2017

Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Games with large branching factors pose a significant challenge for game...
research
07/13/2021

Markov Game with Switching Costs

We study a general Markov game with metric switching costs: in each roun...
research
07/22/2012

Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

The exploration/exploitation (E/E) dilemma arises naturally in many subf...
research
11/20/2018

Playing with and against Hedge

Hedge has been proposed as an adaptive scheme, which guides an agent's d...
research
05/31/2022

Online Meta-Learning in Adversarial Multi-Armed Bandits

We study meta-learning for adversarial multi-armed bandits. We consider ...

Please sign up or login with your details

Forgot password? Click here to reset