Gamification of Pure Exploration for Linear Bandits

07/02/2020
by   Rémy Degenne, et al.
0

We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental design and asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2022

Choosing Answers in ε-Best-Answer Identification for Linear Bandits

In pure-exploration problems, information is gathered sequentially to an...
research
01/10/2023

Best Arm Identification in Stochastic Bandits: Beyond β-optimality

This paper focuses on best arm identification (BAI) in stochastic multi-...
research
10/16/2017

Fully adaptive algorithm for pure exploration in linear bandits

We propose the first fully-adaptive algorithm for pure exploration in li...
research
02/09/2023

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Despite the recent success of representation learning in sequential deci...
research
05/20/2019

Gradient Ascent for Active Exploration in Bandit Problems

We present a new algorithm based on an gradient ascent for a general Act...
research
05/12/2021

High-Dimensional Experimental Design and Kernel Bandits

In recent years methods from optimal linear experimental design have bee...
research
10/16/2021

On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits

We study the Pareto frontier of two archetypal objectives in stochastic ...

Please sign up or login with your details

Forgot password? Click here to reset