Tight Memory-Regret Lower Bounds for Streaming Bandits

06/13/2023
by   Shaoang Li, et al.
0

In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of Ω( (TB)^α K^1-α), α = 2^B / (2^B+1-1) for any algorithm with a time horizon T, number of arms K, and number of passes B. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known Ω(√(KT)) lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of Ω(T^1/(B+1)∑_Δ_x>0μ^*/Δ_x) for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of ϵ-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of Õ( (TB)^α K^1 - α) using constant arm memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2020

Streaming Algorithms for Stochastic Multi-armed Bandits

We study the Stochastic Multi-armed Bandit problem under bounded arm-mem...
research
07/12/2021

Continuous Time Bandits With Sampling Costs

We consider a continuous-time multi-arm bandit problem (CTMAB), where th...
research
07/13/2019

Preselection Bandits under the Plackett-Luce Model

In this paper, we introduce the Preselection Bandit problem, in which th...
research
02/26/2020

Memory-Constrained No-Regret Learning in Adversarial Bandits

An adversarial bandit problem with memory constraints is studied where o...
research
06/01/2017

Scalable Generalized Linear Bandits: Online Computation and Hashing

Generalized Linear Bandits (GLBs), a natural extension of the stochastic...
research
06/17/2022

Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

We generalize the multiple-play multi-armed bandits (MP-MAB) problem wit...
research
05/29/2021

Understanding Bandits with Graph Feedback

The bandit problem with graph feedback, proposed in [Mannor and Shamir, ...

Please sign up or login with your details

Forgot password? Click here to reset