Gradient Ascent for Active Exploration in Bandit Problems

05/20/2019
by   Pierre Ménard, et al.
0

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2016

Optimal Best Arm Identification with Fixed Confidence

We give a complete characterization of the complexity of best-arm identi...
research
01/21/2021

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Combinatorial bandits with semi-bandit feedback generalize multi-armed b...
research
07/02/2020

Gamification of Pure Exploration for Linear Bandits

We investigate an active pure-exploration setting, that includes best-ar...
research
11/05/2019

Towards Optimal and Efficient Best Arm Identification in Linear Bandits

We give a new algorithm for best arm identification in linearly paramete...
research
12/15/2020

Generalized Chernoff Sampling for Active Learning and Structured Bandit Algorithms

Active learning and structured stochastic bandit problems are intimately...
research
02/03/2023

An Asymptotically Optimal Algorithm for the One-Dimensional Convex Hull Feasibility Problem

This work studies the pure-exploration setting for the convex hull feasi...
research
05/27/2021

A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits

We propose a new strategy for best-arm identification with fixed confide...

Please sign up or login with your details

Forgot password? Click here to reset