Regret Bounds for Stochastic Combinatorial Multi-Armed Bandits with Linear Space Complexity

11/29/2018
by   Mridul Agarwal, et al.
0

Many real-world problems face the dilemma of choosing best K out of N options at a given time instant. This setup can be modelled as combinatorial bandit which chooses K out of N arms at each time, with an aim to achieve an efficient tradeoff between exploration and exploitation. This is the first work for combinatorial bandit where the reward received can be a non-linear function of the chosen K arms. The direct use of multi-armed bandit requires choosing among N-choose-K options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in N. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a regret bound of Õ(K^1/2N^1/3T^2/3) for a time horizon T, which is sub-linear in all parameters T, N, and K. The evaluation results on different reward functions and arm distribution functions show significantly improved performance as compared to standard multi-armed bandit approach with NK choices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2020

DART: aDaptive Accept RejecT for non-linear top-K subset identification

We consider the bandit problem of selecting K out of N arms at each time...
research
04/21/2020

Algorithms for slate bandits with non-separable reward functions

In this paper, we study a slate bandit problem where the function that d...
research
12/01/2013

Stochastic continuum armed bandit problem of few linear parameters in high dimensions

We consider a stochastic continuum armed bandit problem where the arms a...
research
09/14/2020

Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

Conservation efforts in green security domains to protect wildlife and f...
research
09/20/2022

Multi-armed Bandit Learning on a Graph

The multi-armed bandit(MAB) problem is a simple yet powerful framework t...
research
11/13/2019

Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling

As the cornerstone of modern portfolio theory, Markowitz's mean-variance...
research
11/14/2019

Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems

We use a novel modification of Multi-Armed Bandits to create a new model...

Please sign up or login with your details

Forgot password? Click here to reset