Federated Neural Bandit

05/28/2022
by   Zhongxiang Dai, et al.
0

Recent works on neural contextual bandit have achieved compelling performances thanks to their ability to leverage the strong representation power of neural networks (NNs) for reward prediction. Many applications of contextual bandit involve multiple agents who collaborate without sharing raw observations, giving rise to the setting of federated contextual bandit. Existing works on federated contextual bandit rely on linear or kernelized bandit, which may fall short when modeling complicated real-world reward functions. In this regard, we introduce the federated neural-upper confidence bound (FN-UCB) algorithm. To better exploit the federated setting, we adopt a weighted combination of two UCBs: UCB^a allows every agent to additionally use the observations from the other agents to accelerate exploration (without sharing raw observations); UCB^b uses an NN with aggregated parameters for reward prediction in a similar way as federated averaging for supervised learning. Notably, the weight between the two UCBs required by our theoretical analysis is amenable to an interesting interpretation, which emphasizes UCB^a initially for accelerated exploration and relies more on UCB^b later after enough observations have been collected to train the NNs for accurate reward prediction (i.e., reliable exploitation). We prove sub-linear upper bounds on both the cumulative regret and the number of communication rounds of FN-UCB, and use empirical experiments to demonstrate its competitive performances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2019

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration

We study the stochastic contextual bandit problem, where the reward is g...
research
10/08/2021

Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

Contextual multi-armed bandits (CMAB) have been widely used for learning...
research
10/04/2021

Asynchronous Upper Confidence Bound Algorithms for Federated Linear Bandits

Linear contextual bandit is a popular online learning problem. It has be...
research
04/17/2021

Conservative Contextual Combinatorial Cascading Bandit

Conservative mechanism is a desirable property in decision-making proble...
research
02/27/2022

Federated Online Sparse Decision Making

This paper presents a novel federated linear contextual bandits model, w...
research
08/25/2023

Federated Linear Bandit Learning via Over-the-Air Computation

In this paper, we investigate federated contextual linear bandit learnin...
research
10/24/2020

Federated Bandit: A Gossiping Approach

In this paper, we study Federated Bandit, a decentralized Multi-Armed Ba...

Please sign up or login with your details

Forgot password? Click here to reset