Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

04/25/2012
by   Sébastien Bubeck, et al.
0

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2020

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

EXP-based algorithms are often used for exploration in multi-armed bandi...
research
03/27/2017

Thompson Sampling for Linear-Quadratic Control Problems

We consider the exploration-exploitation tradeoff in linear quadratic (L...
research
10/28/2020

Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility

The demand for seamless Internet access under extreme user mobility, suc...
research
12/14/2020

Bayesian Optimization – Multi-Armed Bandit Problem

In this report, we survey Bayesian Optimization methods focussed on the ...
research
04/18/2021

Monte Carlo Elites: Quality-Diversity Selection as a Multi-Armed Bandit Problem

A core challenge of evolutionary search is the need to balance between e...
research
05/26/2022

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment

Multi-armed bandit (MAB) is a classic model for understanding the explor...
research
07/22/2012

Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

The exploration/exploitation (E/E) dilemma arises naturally in many subf...

Please sign up or login with your details

Forgot password? Click here to reset