On Private and Robust Bandits

02/06/2023
by   Yulian Wu, et al.
0

We study private and robust multi-armed bandits (MABs), where the agent receives Huber's contaminated heavy-tailed rewards and meanwhile needs to ensure differential privacy. We first present its minimax lower bound, characterizing the information-theoretic limit of regret with respect to privacy budget, contamination level and heavy-tailedness. Then, we propose a meta-algorithm that builds on a private and robust mean estimation sub-routine that essentially relies on reward truncation and the Laplace mechanism only. For two different heavy-tailed settings, we give specific schemes of , which enable us to achieve nearly-optimal regret. As by-products of our main results, we also give the first minimax lower bound for private heavy-tailed MABs (i.e., without contamination). Moreover, our two proposed truncation-based achieve the optimal trade-off between estimation accuracy, privacy and robustness. Finally, we support our theoretical results with experimental studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards

In this paper, we study the problem of (finite horizon tabular) Markov d...
research
06/04/2021

Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits

In this paper we study the problem of stochastic multi-armed bandits (MA...
research
01/23/2023

Quantum Heavy-tailed Bandits

In this paper, we study multi-armed bandits (MAB) and stochastic linear ...
research
10/24/2020

Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

In this paper, we consider stochastic multi-armed bandits (MABs) with he...
research
02/25/2021

No-Regret Reinforcement Learning with Heavy-Tailed Rewards

Reinforcement learning algorithms typically assume rewards to be sampled...
research
07/31/2020

Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization

We study the problem of estimating the mean of a distribution in high di...
research
10/19/2021

Regret Minimization in Isotonic, Heavy-Tailed Contextual Bandits via Adaptive Confidence Bands

In this paper we initiate a study of non parametric contextual bandits u...

Please sign up or login with your details

Forgot password? Click here to reset