DART: aDaptive Accept RejecT for non-linear top-K subset identification

11/16/2020
by   Mridul Agarwal, et al.
0

We consider the bandit problem of selecting K out of N arms at each time step. The reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among NK options, making the action space large. To simplify the problem, existing works on combinatorial bandits typically assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-K subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in N. Further, DART achieves a regret bound of 𝒪̃(K√(KNT)) for a time horizon T, which matches the lower bound in bandit feedback up to a factor of √(log2NT). When applied to the problem of cross-selling optimization and maximizing the mean of individual rewards, the performance of the proposed algorithm surpasses that of state-of-the-art algorithms. We also show that DART significantly outperforms existing methods for both linear and non-linear joint reward environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2018

Regret Bounds for Stochastic Combinatorial Multi-Armed Bandits with Linear Space Complexity

Many real-world problems face the dilemma of choosing best K out of N op...
research
05/28/2019

Combinatorial Bandits with Full-Bandit Feedback: Sample Complexity and Regret Minimization

Combinatorial Bandits generalize multi-armed bandits, where k out of n a...
research
01/05/2021

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

In this paper we study the adversarial combinatorial bandit with a known...
research
02/04/2014

Online Stochastic Optimization under Correlated Bandit Feedback

In this paper we consider the problem of online stochastic optimization ...
research
12/23/2015

The Max K-Armed Bandit: PAC Lower Bounds and Efficient Algorithms

We consider the Max K-Armed Bandit problem, where a learning agent is fa...
research
04/25/2023

Communication-Constrained Bandits under Additive Gaussian Noise

We study a distributed stochastic multi-armed bandit where a client supp...
research
05/22/2019

Thresholding Graph Bandits with GrAPL

In this paper, we introduce a new online decision making paradigm that w...

Please sign up or login with your details

Forgot password? Click here to reset