A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

01/30/2023
by   Guanyu Nie, et al.
0

We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline approximation algorithms into sublinear α-regret methods that only require bandit feedback, achieving 𝒪(T^2/3log(T)^1/3) expected cumulative α-regret dependence on the horizon T. The framework only requires the offline algorithms to be robust to small errors in function evaluation. The adaptation procedure does not even require explicit knowledge of the offline approximation algorithm – the offline algorithm can be used as black box subroutine. To demonstrate the utility of the proposed framework, the proposed framework is applied to multiple problems in submodular maximization, adapting approximation algorithms for cardinality and for knapsack constraints. The new CMAB algorithms for knapsack constraints outperform a full-bandit method developed for the adversarial setting in experiments with real-world data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2015

Combinatorial Bandits Revisited

This paper investigates stochastic and adversarial combinatorial multi-a...
research
10/20/2016

Combinatorial Multi-Armed Bandit with General Reward Functions

In this paper, we study the stochastic combinatorial multi-armed bandit ...
research
11/27/2022

Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints

This paper studies the problem of stochastic continuum-armed bandit with...
research
02/02/2023

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

We investigate the problem of unconstrained combinatorial multi-armed ba...
research
03/29/2022

On Kernelized Multi-Armed Bandits with Constraints

We study a stochastic bandit problem with a general unknown reward funct...
research
08/24/2023

Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

We propose a novel master-slave architecture to solve the top-K combinat...
research
01/30/2019

Online Pandora's Boxes and Bandits

We consider online variations of the Pandora's box problem (Weitzman. 19...

Please sign up or login with your details

Forgot password? Click here to reset