A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits

05/27/2021
by   Antoine Barrier, et al.
0

We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy called Exploration-Biased Sampling is not only asymptotically optimal: we also prove non-asymptotic bounds occurring with high probability. To the best of our knowledge, this is the first strategy with such guarantees. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is slightly biased in favor of exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2016

Optimal Best Arm Identification with Fixed Confidence

We give a complete characterization of the complexity of best-arm identi...
research
05/25/2023

An ε-Best-Arm Identification Algorithm for Fixed-Confidence and Beyond

We propose EB-TCε, a novel sampling rule for ε-best arm identification i...
research
10/24/2019

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

We investigate and provide new insights on the sampling rule called Top-...
research
02/16/2017

The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime

We propose a novel technique for analyzing adaptive sampling called the ...
research
12/02/2019

Optimal Best Markovian Arm Identification with Fixed Confidence

We give a complete characterization of the sampling complexity of best M...
research
05/20/2019

Gradient Ascent for Active Exploration in Bandit Problems

We present a new algorithm based on an gradient ascent for a general Act...
research
10/11/2022

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

A Top Two sampling rule for bandit identification is a method which sele...

Please sign up or login with your details

Forgot password? Click here to reset