Fully adaptive algorithm for pure exploration in linear bandits

10/16/2017
by   Liyuan Xu, et al.
0

We propose the first fully-adaptive algorithm for pure exploration in linear bandits---the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely fix sequences of arm selections before observing rewards, our method adaptively changes the arm selection strategy based on past observations at each round. We show our sample complexity matches the achievable lower bound up to a constant factor in an extreme case. Furthermore, we evaluate the performance of the methods by simulations based on both synthetic setting and real-world data, in which our method shows vast improvement over existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2022

Near-Optimal Collaborative Learning in Bandits

This paper introduces a general multi-agent bandit model in which each a...
research
07/02/2020

Gamification of Pure Exploration for Linear Bandits

We investigate an active pure-exploration setting, that includes best-ar...
research
06/22/2021

Pure Exploration in Kernel and Neural Bandits

We study pure exploration in bandits, where the dimension of the feature...
research
02/27/2019

Polynomial-time Algorithms for Combinatorial Pure Exploration with Full-bandit Feedback

We study the problem of stochastic combinatorial pure exploration (CPE),...
research
06/10/2022

Interactively Learning Preference Constraints in Linear Bandits

We study sequential decision-making with known rewards and unknown const...
research
06/03/2023

Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits

We study pure exploration with infinitely many bandit arms generated i.i...
research
06/22/2022

Active Learning with Safety Constraints

Active learning methods have shown great promise in reducing the number ...

Please sign up or login with your details

Forgot password? Click here to reset