Combinatorial Pure Exploration of Multi-Armed Bandit with a Real Number Action Class

06/15/2023
by   Shintaro Nakamura, et al.
0

The combinatorial pure exploration (CPE) in the stochastic multi-armed bandit setting (MAB) is a well-studied online decision-making problem: A player wants to find the optimal action π^* from action class 𝒜, which is a collection of subsets of arms with certain combinatorial structures. Though CPE can represent many combinatorial structures such as paths, matching, and spanning trees, most existing works focus only on binary action class 𝒜⊆{0, 1}^d for some positive integer d. This binary formulation excludes important problems such as the optimal transport, knapsack, and production planning problems. To overcome this limitation, we extend the binary formulation to real, 𝒜⊆ℝ^d, and propose a new algorithm. The only assumption we make is that the number of actions in 𝒜 is polynomial in d. We show an upper bound of the sample complexity for our algorithm and the action class-dependent lower bound for R-CPE-MAB, by introducing a quantity that characterizes the problem's difficulty, which is a generalization of the notion width introduced in Chen et al.[2014].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2023

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

We study the real-valued combinatorial pure exploration of the multi-arm...
research
06/18/2022

Thompson Sampling for (Combinatorial) Pure Exploration

Existing methods of combinatorial pure exploration mainly focus on the U...
research
06/04/2017

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

We study the combinatorial pure exploration problem Best-Set in stochast...
research
06/23/2019

Making the Cut: A Bandit-based Approach to Tiered Interviewing

Given a huge set of applicants, how should a firm allocate sequential re...
research
06/23/2020

Combinatorial Pure Exploration of Dueling Bandit

In this paper, we study combinatorial pure exploration for dueling bandi...
research
01/20/2023

GBOSE: Generalized Bandit Orthogonalized Semiparametric Estimation

In sequential decision-making scenarios i.e., mobile health recommendati...

Please sign up or login with your details

Forgot password? Click here to reset