The Max K-Armed Bandit: A PAC Lower Bound and tighter Algorithms

08/23/2015
by   Yahel David, et al.
0

We consider the Max K-Armed Bandit problem, where a learning agent is faced with several sources (arms) of items (rewards), and interested in finding the best item overall. At each time step the agent chooses an arm, and obtains a random real valued reward. The rewards of each arm are assumed to be i.i.d., with an unknown probability distribution that generally differs among the arms. Under the PAC framework, we provide lower bounds on the sample complexity of any (ϵ,δ)-correct algorithm, and propose algorithms that attain this bound up to logarithmic factors. We compare the performance of this multi-arm algorithms to the variant in which the arms are not distinguishable by the agent and are chosen randomly at each stage. Interestingly, when the maximal rewards of the arms happen to be similar, the latter approach may provide better performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2015

The Max K-Armed Bandit: PAC Lower Bounds and Efficient Algorithms

We consider the Max K-Armed Bandit problem, where a learning agent is fa...
research
06/20/2020

An Optimal Elimination Algorithm for Learning a Best Arm

We consider the classic problem of (ϵ,δ)-PAC learning a best arm where t...
research
08/20/2023

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

We study the real-valued combinatorial pure exploration of the multi-arm...
research
02/13/2022

On the complexity of All ε-Best Arms Identification

We consider the problem introduced by <cit.> of identifying all the ε-op...
research
10/29/2021

A/B/n Testing with Control in the Presence of Subpopulations

Motivated by A/B/n testing applications, we consider a finite set of dis...
research
06/15/2019

The True Sample Complexity of Identifying Good Arms

We consider two multi-armed bandit problems with n arms: (i) given an ϵ ...
research
02/09/2022

Optimal Clustering with Bandit Feedback

This paper considers the problem of online clustering with bandit feedba...

Please sign up or login with your details

Forgot password? Click here to reset