Communication-Constrained Bandits under Additive Gaussian Noise

04/25/2023
by   Prathamesh Mayekar, et al.
0

We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than P, and this encoded reward is further corrupted by additive Gaussian noise of variance σ^2; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of Ω(√(KT/𝚂𝙽𝚁∧1)) on the minimax regret of any scheme, where 𝚂𝙽𝚁 := P/σ^2, and K and T are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, 𝚄𝙴-𝚄𝙲𝙱++, which matches this lower bound to a minor additive factor. 𝚄𝙴-𝚄𝙲𝙱++ performs uniform exploration in its initial phases and then utilizes the upper confidence bound (UCB) bandit algorithm in its final phase. An interesting feature of 𝚄𝙴-𝚄𝙲𝙱++ is that the coarser estimates of the mean rewards formed during a uniform exploration phase help to refine the encoding protocol in the next phase, leading to more accurate mean estimates of the rewards in the subsequent phase. This positive reinforcement cycle is critical to reducing the number of uniform exploration rounds and closely matching our lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Federated Best Arm Identification with Heterogeneous Clients

We study best arm identification in a federated multi-armed bandit setti...
research
11/02/2020

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit...
research
11/16/2020

DART: aDaptive Accept RejecT for non-linear top-K subset identification

We consider the bandit problem of selecting K out of N arms at each time...
research
08/16/2017

Corrupt Bandits for Preserving Local Privacy

We study a variant of the stochastic multi-armed bandit (MAB) problem in...
research
11/11/2021

Solving Multi-Arm Bandit Using a Few Bits of Communication

The multi-armed bandit (MAB) problem is an active learning framework tha...
research
03/07/2021

CORe: Capitalizing On Rewards in Bandit Exploration

We propose a bandit algorithm that explores purely by randomizing its pa...
research
05/24/2019

Adaptive Symmetric Reward Noising for Reinforcement Learning

Recent reinforcement learning algorithms, though achieving impressive re...

Please sign up or login with your details

Forgot password? Click here to reset