Stochastic Multi-armed Bandits in Constant Space

12/25/2017
by   David Liau, et al.
0

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all K arms. We give an algorithm using O(1) words of space with regret ∑_i=1^K1/Δ_iΔ_i/Δ T where Δ_i is the gap between the best arm and arm i and Δ is the gap between the best and the second-best arms. If the rewards are bounded away from 0 and 1, this is within an O( 1/Δ) factor of the optimum regret possible without space constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2013

Bounded regret in stochastic multi-armed bandits

We study the stochastic multi-armed bandit problem when one knows the va...
research
01/24/2020

Ballooning Multi-Armed Bandits

In this paper, we introduce Ballooning Multi-Armed Bandits (BL-MAB), a n...
research
02/26/2020

Memory-Constrained No-Regret Learning in Adversarial Bandits

An adversarial bandit problem with memory constraints is studied where o...
research
05/17/2019

Pair Matching: When bandits meet stochastic block model

The pair-matching problem appears in many applications where one wants t...
research
02/12/2018

Multi-Armed Bandits on Unit Interval Graphs

An online learning problem with side information on the similarity and d...
research
06/17/2020

The Influence of Shape Constraints on the Thresholding Bandit Problem

We investigate the stochastic Thresholding Bandit problem (TBP) under se...
research
02/25/2021

Combinatorial Bandits under Strategic Manipulations

We study the problem of combinatorial multi-armed bandits (CMAB) under s...

Please sign up or login with your details

Forgot password? Click here to reset