(Private) Kernelized Bandits with Distributed Biased Feedback

01/28/2023
by   Fengjiao Li, et al.
0

In this paper, we study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the reward of the action chosen by a central entity, but it is difficult to collect feedback from all users. Instead, only biased feedback (due to user heterogeneity) from a subset of users may be available. In addition to such partial biased feedback, we are also faced with two practical challenges due to communication cost and computation complexity. To tackle these challenges, we carefully design a new distributed phase-then-batch-based elimination () algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase. By properly choosing the phase length, the batch size, and the confidence width used for eliminating suboptimal actions, we show that achieves a sublinear regret of Õ(T^1-α/2+√(γ_T T)), where α∈ (0,1) is the user-sampling parameter one can tune. Moreover, can significantly reduce both communication cost and computation complexity in distributed kernelized bandits, compared to some variants of the state-of-the-art algorithms (originally developed for standard kernelized bandits). Furthermore, by incorporating various differential privacy models (including the central, local, and shuffle models), we generalize to provide privacy guarantees for users participating in the distributed learning process. Finally, we conduct extensive simulations to validate our theoretical results and evaluate the empirical performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

Differentially Private Linear Bandits with Partial Distributed Feedback

In this paper, we study the problem of global reward maximization with o...
research
02/11/2022

Shuffle Private Linear Contextual Bandits

Differential privacy (DP) has been recently introduced to linear context...
research
07/04/2018

Factored Bandits

We introduce the factored bandits model, which is a framework for learni...
research
04/23/2023

Robust and differentially private stochastic linear bandits

In this paper, we study the stochastic linear bandit problem under the a...
research
10/08/2020

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Modifying the reward-biased maximum likelihood method originally propose...
research
08/31/2022

Federated Online Clustering of Bandits

Contextual multi-armed bandit (MAB) is an important sequential decision-...
research
12/23/2021

Towards identifying optimal biased feedback for various user states and traits in motor imagery BCI

Objective. Neural self-regulation is necessary for achieving control ove...

Please sign up or login with your details

Forgot password? Click here to reset