Bandits for BMO Functions

07/17/2020
by   Tianyu Wang, et al.
0

We study the bandit problem where the underlying expected reward is a Bounded Mean Oscillation (BMO) function. BMO functions are allowed to be discontinuous and unbounded, and are useful in modeling signals with infinities in the do-main. We develop a toolset for BMO bandits, and provide an algorithm that can achieve poly-log δ-regret – a regret measured against an arm that is optimal after removing a δ-sized portion of the arm space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2021

Batched Lipschitz Bandits

In this paper, we study the batched Lipschitz bandit problem, where the ...
research
08/03/2019

Nonparametric Contextual Bandits in an Unknown Metric Space

Consider a nonparametric contextual multi-arm bandit problem where each ...
research
12/10/2020

Thompson Sampling for CVaR Bandits

Risk awareness is an important feature to formulate a variety of real wo...
research
02/15/2021

Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

This paper studies bandit algorithms under data poisoning attacks in a b...
research
02/09/2016

Compliance-Aware Bandits

Motivated by clinical trials, we study bandits with observable non-compl...
research
10/31/2019

Recovering Bandits

We study the recovering bandits problem, a variant of the stochastic mul...
research
10/11/2022

The Typical Behavior of Bandit Algorithms

We establish strong laws of large numbers and central limit theorems for...

Please sign up or login with your details

Forgot password? Click here to reset