Memory-Constrained No-Regret Learning in Adversarial Bandits

02/26/2020
by   Xiao Xu, et al.
0

An adversarial bandit problem with memory constraints is studied where only the statistics of a subset of arms can be stored. A hierarchical learning policy that requires only a sublinear order of memory space in terms of the number of arms is developed. Its sublinear regret orders with respect to the time horizon are established for both weak regret and shifting regret. This work appears to be the first on memory-constrained bandit problems under the adversarial setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/25/2017

Stochastic Multi-armed Bandits in Constant Space

We consider the stochastic bandit problem in the sublinear space setting...
research
06/13/2023

Tight Memory-Regret Lower Bounds for Streaming Bandits

In this paper, we investigate the streaming bandits problem, wherein the...
research
02/07/2022

On learning Whittle index policy for restless bandits with scalable regret

Reinforcement learning is an attractive approach to learn good resource ...
research
03/03/2021

Linear Bandit Algorithms with Sublinear Time Complexity

We propose to accelerate existing linear bandit algorithms to achieve pe...
research
08/11/2022

Regret Analysis for Hierarchical Experts Bandit Problem

We study an extension of standard bandit problem in which there are R la...
research
05/17/2019

Pair Matching: When bandits meet stochastic block model

The pair-matching problem appears in many applications where one wants t...
research
04/28/2019

Periodic Bandits and Wireless Network Selection

Bandit-style algorithms have been studied extensively in stochastic and ...

Please sign up or login with your details

Forgot password? Click here to reset