Regret Minimization in Isotonic, Heavy-Tailed Contextual Bandits via Adaptive Confidence Bands

10/19/2021
by   Sabyasachi Chatterjee, et al.
0

In this paper we initiate a study of non parametric contextual bandits under shape constraints on the mean reward function. Specifically, we study a setting where the context is one dimensional, and the mean reward function is isotonic with respect to this context. We propose a policy for this problem and show that it attains minimax rate optimal regret. Moreover, we show that the same policy enjoys automatic adaptation; that is, for subclasses of the parameter space where the true mean reward functions are also piecewise constant with k pieces, this policy remains minimax rate optimal simultaneously for all k ≥ 1. Automatic adaptation phenomena are well-known for shape constrained problems in the offline setting; to occur in offline problems; we show that such phenomena carry over to the online setting. The main technical ingredient underlying our policy is a procedure to derive confidence bands for an underlying isotonic function using the isotonic quantile estimator. The confidence band we propose is valid under heavy tailed noise, and its average width goes to 0 at an adaptively optimal rate. We consider this to be an independent contribution to the isotonic regression literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2021

Neural Contextual Bandits without Regret

Contextual bandits are a rich model for sequential decision making given...
research
05/21/2022

Pessimism for Offline Linear Contextual Bandits using ℓ_p Confidence Sets

We present a family {π̂}_p≥ 1 of pessimistic learning rules for offline ...
research
06/29/2023

Kernel ε-Greedy for Contextual Bandits

We consider a kernelized version of the ϵ-greedy strategy for contextual...
research
03/07/2022

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

In this paper, we study the stochastic bandits problem with k unknown he...
research
02/06/2023

On Private and Robust Bandits

We study private and robust multi-armed bandits (MABs), where the agent ...
research
07/11/2023

Tracking Most Significant Shifts in Nonparametric Contextual Bandits

We study nonparametric contextual bandits where Lipschitz mean reward fu...
research
02/16/2021

Making the most of your day: online learning for optimal allocation of time

We study online learning for optimal allocation when the resource to be ...

Please sign up or login with your details

Forgot password? Click here to reset