Distributed Stochastic Bandit Learning with Context Distributions

by   Jiabin Lin, et al.

We study the problem of distributed stochastic multi-arm contextual bandit with unknown contexts, in which M agents work collaboratively to choose optimal actions under the coordination of a central server in order to minimize the total regret. In our model, an adversary chooses a distribution on the set of possible contexts and the agents observe only the context distribution and the exact context is unknown to the agents. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism as in weather forecasting or stock market prediction. Our goal is to develop a distributed algorithm that selects a sequence of optimal actions to maximize the cumulative reward. By performing a feature vector transformation and by leveraging the UCB algorithm, we propose a UCB algorithm for stochastic bandits with context distribution and prove that our algorithm achieves a regret and communications bounds of O(d√(MT)log^2T) and O(M^1.5d^3), respectively, for linearly parametrized reward functions. We also consider a case where the agents observe the actual context after choosing the action. For this setting we presented a modified algorithm that utilizes the additional information to achieve a tighter regret bound. Finally, we validated the performance of our algorithms and compared it with other baseline approaches using extensive simulations on synthetic data and on the real world movielens dataset.


Federated Stochastic Bandit Learning with Unobserved Context

We study the problem of federated stochastic multi-arm contextual bandit...

Stochastic Bandits with Context Distributions

We introduce a novel stochastic contextual bandit model, where at each s...

Stochastic Conservative Contextual Linear Bandits

Many physical systems have underlying safety considerations that require...

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

We study distributed contextual linear bandits with stochastic contexts,...

Learning in Distributed Contextual Linear Bandits Without Sharing the Context

Contextual linear bandits is a rich and theoretically important model th...

Stochastic Contextual Bandits with Graph-based Contexts

We naturally generalize the on-line graph prediction problem to a versio...

Autoregressive Bandits

Autoregressive processes naturally arise in a large variety of real-worl...

Please sign up or login with your details

Forgot password? Click here to reset