Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

01/21/2021
by   Marc Jourdan, et al.
0

Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2020

Experimental Design for Regret Minimization in Linear Bandits

In this paper we propose a novel experimental design-based algorithm to ...
research
02/09/2022

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

We consider the combinatorial bandits problem with semi-bandit feedback ...
research
05/20/2019

Gradient Ascent for Active Exploration in Bandit Problems

We present a new algorithm based on an gradient ascent for a general Act...
research
10/30/2020

The Combinatorial Multi-Bandit Problem and its Application to Energy Management

We study a Combinatorial Multi-Bandit Problem motivated by applications ...
research
06/01/2022

Incentivizing Combinatorial Bandit Exploration

Consider a bandit algorithm that recommends actions to self-interested u...
research
06/25/2019

Non-Asymptotic Pure Exploration by Solving Games

Pure exploration (aka active testing) is the fundamental task of sequent...

Please sign up or login with your details

Forgot password? Click here to reset