Combinatorial Pure Exploration with Partial or Full-Bandit Linear Feedback

06/14/2020
by   Wei Chen, et al.
0

In this paper, we propose the novel model of combinatorial pure exploration with partial linear feedback (CPE-PL). In CPE-PL, given a combinatorial action space X⊆{0,1}^d, in each round a learner chooses one action x ∈X to play, obtains a random (possibly nonlinear) reward related to x and an unknown latent vector θ∈R^d, and observes a partial linear feedback M_x (θ + η), where η is a zero-mean noise vector and M_x is a transformation matrix for x. The objective is to identify the optimal action with the maximum expected reward using as few rounds as possible. We also study the important subproblem of CPE-PL, i.e., combinatorial pure exploration with full-bandit feedback (CPE-BL), in which the learner observes full-bandit feedback (i.e. M_x = x^) and gains linear expected reward x^θ after each play. In this paper, we first propose a polynomial-time algorithmic framework for the general CPE-PL problem with novel sample complexity analysis. Then, we propose an adaptive algorithm dedicated to the subproblem CPE-BL with better sample complexity. Our work provides a novel polynomial-time solution to simultaneously address limited feedback, general reward function and combinatorial action space including matroids, matchings, and s-t paths.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2019

Polynomial-time Algorithms for Combinatorial Pure Exploration with Full-bandit Feedback

We study the problem of stochastic combinatorial pure exploration (CPE),...
research
05/04/2018

Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications (Extended Version)

We study the Combinatorial Pure Exploration problem with Continuous and ...
research
06/23/2020

Combinatorial Pure Exploration of Dueling Bandit

In this paper, we study combinatorial pure exploration for dueling bandi...
research
07/27/2020

Fast active learning for pure exploration in reinforcement learning

Realistic environments often provide agents with very limited feedback. ...
research
05/08/2021

Pure Exploration Bandit Problem with General Reward Functions Depending on Full Distributions

In this paper, we study the pure exploration bandit model on general dis...
research
02/01/2022

Regret Minimization with Performative Feedback

In performative prediction, the deployment of a predictive model trigger...

Please sign up or login with your details

Forgot password? Click here to reset