Interactive Combinatorial Bandits: Balancing Competitivity and Complementarity

07/07/2022
by   Adhyyan Narang, et al.
10

We study non-modular function maximization in the online interactive bandit setting. We are motivated by applications where there is a natural complementarity between certain elements: e.g., in a movie recommendation system, watching the first movie in a series complements the experience of watching a second (and a third, etc.). This is not expressible using only submodular functions which can represent only competitiveness between elements. We extend the purely submodular approach in two ways. First, we assume that the objective can be decomposed into the sum of monotone suBmodular and suPermodular function, known as a BP objective. Here, complementarity is naturally modeled by the supermodular component. We develop a UCB-style algorithm, where at each round a noisy gain is revealed after an action is taken that balances refining beliefs about the unknown objectives (exploration) and choosing actions that appear promising (exploitation). Defining regret in terms of submodular and supermodular curvature with respect to a full-knowledge greedy baseline, we show that this algorithm achieves at most O(√(T)) regret after T rounds of play. Second, for those functions that do not admit a BP structure, we provide analogous regret guarantees in terms of their submodularity ratio; this is applicable for functions that are almost, but not quite, submodular. We numerically study the tasks of movie recommendation on the MovieLens dataset, and selection of training subsets for classification. Through these examples, we demonstrate the algorithm's performance as well as the shortcomings of viewing these problems as being solely submodular.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2019

Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback

In this paper, we propose three online algorithms for submodular maximis...
research
05/21/2023

Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits

We investigate the online bandit learning of the monotone multi-linear D...
research
07/13/2018

No-regret algorithms for online k-submodular maximization

We present a polynomial time algorithm for online maximization of k-subm...
research
04/06/2021

The Power of Subsampling in Submodular Maximization

We propose subsampling as a unified algorithmic technique for submodular...
research
12/03/2021

On Submodular Contextual Bandits

We consider the problem of contextual bandits where actions are subsets ...
research
01/23/2018

Greed is Still Good: Maximizing Monotone Submodular+Supermodular Functions

We analyze the performance of the greedy algorithm, and also a discrete ...

Please sign up or login with your details

Forgot password? Click here to reset