Combinatorial Multi-Armed Bandit with General Reward Functions

10/20/2016
by   Wei Chen, et al.
0

In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the () function and nonlinear utility functions. Existing techniques relying on accurate estimations of the means of random variables, such as the upper confidence bound (UCB) technique, do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound (SDCB), which estimates the distributions of underlying random variables and their stochastically dominant confidence bounds. We prove that SDCB can achieve O(T) distribution-dependent regret and Õ(√(T)) distribution-independent regret, where T is the time horizon. We apply our results to the K-MAX problem and expected utility maximization problems. In particular, for K-MAX, we provide the first polynomial-time approximation scheme (PTAS) for its offline problem, and give the first Õ(√(T)) bound on the (1-ϵ)-approximation regret of its online problem, for any ϵ>0.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback

We consider a combinatorial multi-armed bandit problem for maximum value...
research
12/30/2021

Reversible Upper Confidence Bound Algorithm to Generate Diverse Optimized Candidates

Most algorithms for the multi-armed bandit problem in reinforcement lear...
research
01/30/2023

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

We investigate the problem of stochastic, combinatorial multi-armed band...
research
06/08/2022

Uplifting Bandits

We introduce a multi-armed bandit model where the reward is a sum of mul...
research
07/24/2017

Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret

In this paper, we study the combinatorial multi-armed bandit problem (CM...
research
05/26/2017

Combinatorial Multi-Armed Bandits with Filtered Feedback

Motivated by problems in search and detection we present a solution to a...
research
11/08/2021

The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle

Thompson sampling (TS) has attracted a lot of interest in the bandit are...

Please sign up or login with your details

Forgot password? Click here to reset