Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

by   Marc Jourdan, et al.

Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.



There are no comments yet.


page 1

page 2

page 3

page 4


Experimental Design for Regret Minimization in Linear Bandits

In this paper we propose a novel experimental design-based algorithm to ...

The Combinatorial Multi-Bandit Problem and its Application to Energy Management

We study a Combinatorial Multi-Bandit Problem motivated by applications ...

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

We consider the combinatorial bandits problem with semi-bandit feedback ...

Gradient Ascent for Active Exploration in Bandit Problems

We present a new algorithm based on an gradient ascent for a general Act...

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

We investigate stochastic combinatorial multi-armed bandit with semi-ban...

Non-Asymptotic Pure Exploration by Solving Games

Pure exploration (aka active testing) is the fundamental task of sequent...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.