DeepAI
Log In Sign Up

Streaming Algorithms for Stochastic Multi-armed Bandits

12/09/2020
by   Arnab Maiti, et al.
0

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight hardness. We show Ω(T^2/3) cumulative regret in expectation for arm-memory size of (n-1), where n is the number of arms. For best-arm identification, we study two algorithms. First, we present an O(r) arm-memory r-round adaptive streaming algorithm to find an ϵ-best arm. In r-round adaptive streaming algorithm for best-arm identification, the arm pulls in each round are decided based on the observed outcomes in the earlier rounds. The best-arm is the output at the end of r rounds. The upper bound on the sample complexity of our algorithm matches with the lower bound for any r-round adaptive streaming algorithm. Secondly, we present a heuristic to find the ϵ-best arm with optimal sample complexity, by storing only one extra arm in the memory.

READ FULL TEXT
09/13/2022

Sample Complexity of an Adversarial Attack on UCB-based Best-arm Identification Policy

In this work I study the problem of adversarial perturbations to rewards...
11/19/2018

Best-arm identification with cascading bandits

We consider a variant of the problem of best arm identification in multi...
10/15/2020

Stochastic Bandits with Vector Losses: Minimizing ℓ^∞-Norm of Relative Losses

Multi-armed bandits are widely applied in scenarios like recommender sys...
06/06/2021

PAC Best Arm Identification Under a Deadline

We study (ϵ, δ)-PAC best arm identification, where a decision-maker must...
02/27/2015

Non-stochastic Best Arm Identification and Hyperparameter Optimization

Motivated by the task of hyperparameter optimization, we introduce the n...
02/02/2023

Learning with Exposure Constraints in Recommendation Systems

Recommendation systems are dynamic economic systems that balance the nee...
10/09/2018

Bridging the gap between regret minimization and best arm identification, with application to A/B tests

State of the art online learning procedures focus either on selecting th...