Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback

05/25/2023
by   Yiliu Wang, et al.
0

We consider a combinatorial multi-armed bandit problem for maximum value reward function under maximum value and index feedback. This is a new feedback structure that lies in between commonly studied semi-bandit and full-bandit feedback structures. We propose an algorithm and provide a regret bound for problem instances with stochastic arm outcomes according to arbitrary distributions with finite supports. The regret analysis rests on considering an extended set of arms, associated with values and probabilities of arm outcomes, and applying a smoothness condition. Our algorithm achieves a O((k/Δ)log(T)) distribution-dependent and a Õ(√(T)) distribution-independent regret where k is the number of arms selected in each round, Δ is a distribution-dependent reward gap and T is the horizon time. Perhaps surprisingly, the regret bound is comparable to previously-known bound under more informative semi-bandit feedback. We demonstrate the effectiveness of our algorithm through experimental results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2018

Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms

We analyze the regret of combinatorial Thompson sampling (CTS) for the c...
research
05/26/2017

Combinatorial Multi-Armed Bandits with Filtered Feedback

Motivated by problems in search and detection we present a solution to a...
research
01/31/2022

Rotting infinitely many-armed bandits

We consider the infinitely many-armed bandit problem with rotting reward...
research
06/03/2021

Sleeping Combinatorial Bandits

In this paper, we study an interesting combination of sleeping and combi...
research
10/20/2016

Combinatorial Multi-Armed Bandit with General Reward Functions

In this paper, we study the stochastic combinatorial multi-armed bandit ...
research
06/20/2019

Stochastic One-Sided Full-Information Bandit

In this paper, we study the stochastic version of the one-sided full inf...
research
09/18/2019

No-Regret Learning in Unknown Games with Correlated Payoffs

We consider the problem of learning to play a repeated multi-agent game ...

Please sign up or login with your details

Forgot password? Click here to reset