Learning the distribution with largest mean: two bandit frameworks

01/31/2017
by   Emilie Kaufmann, et al.
0

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process. For both of them (regret minimization and best arm identification) we present recent, asymptotically optimal algorithms. We compare the behaviors of the sampling rule of each algorithm as well as the complexity terms associated to each problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2013

On Finding the Largest Mean Among Many

Sampling from distributions to find the one with the largest mean arises...
research
07/25/2013

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Learning from prior tasks and transferring that experience to improve fu...
research
07/16/2014

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

The stochastic multi-armed bandit model is a simple abstraction that has...
research
08/16/2017

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

Thompson sampling has impressive empirical performance for many multi-ar...
research
02/03/2023

An Asymptotically Optimal Algorithm for the One-Dimensional Convex Hull Feasibility Problem

This work studies the pure-exploration setting for the convex hull feasi...
research
03/27/2013

Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

We address the problem of finding the maximizer of a nonlinear smooth fu...
research
04/22/2020

Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D

In evolutionary computation, different reproduction operators have vario...

Please sign up or login with your details

Forgot password? Click here to reset