Combinatorial Multi-Armed Bandits with Filtered Feedback

05/26/2017
by   James A. Grant, et al.
0

Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set {1,...,k} in each round, generating random outcomes from probability distributions associated with these arms and receiving an overall reward. Under semibandit feedback it is assumed that the random outcomes generated are all observed. Filtered semibandit feedback allows the outcomes that are observed to be sampled from a second distribution conditioned on the initial random outcomes. This feedback mechanism is valuable as it allows CMAB methods to be applied to sequential search and detection problems where combinatorial actions are made, but the true rewards (number of objects of interest appearing in the round) are not observed, rather a filtered reward (the number of objects the searcher successfully finds, which must by definition be less than the number that appear). We present an upper confidence bound type algorithm, Robust-F-CUCB, and associated regret bound of order O((n)) to balance exploration and exploitation in the face of both filtering of reward and heavy tailed reward distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback

We consider a combinatorial multi-armed bandit problem for maximum value...
research
11/12/2019

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

We study incentivized exploration for the multi-armed bandit (MAB) probl...
research
05/13/2014

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

In a multi-armed bandit (MAB) problem a gambler needs to choose at each ...
research
05/28/2019

Combinatorial Bandits with Full-Bandit Feedback: Sample Complexity and Regret Minimization

Combinatorial Bandits generalize multi-armed bandits, where k out of n a...
research
10/20/2016

Combinatorial Multi-Armed Bandit with General Reward Functions

In this paper, we study the stochastic combinatorial multi-armed bandit ...
research
09/15/2018

Incorporating Behavioral Constraints in Online AI Systems

AI systems that learn through reward feedback about the actions they tak...
research
10/04/2018

Adaptive Policies for Perimeter Surveillance Problems

Maximising the detection of intrusions is a fundamental and often critic...

Please sign up or login with your details

Forgot password? Click here to reset