On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits

10/16/2021
by   Zixin Zhong, et al.
0

We study the Pareto frontier of two archetypal objectives in stochastic bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To make this precise, we first design and analyze the BoBW-lil'UCB(γ) algorithm, which achieves order-wise optimal performance for RM or BAI under different values of γ. Complementarily, we show that no algorithm can simultaneously perform optimally for both the RM and BAI objectives. More precisely, we establish non-trivial lower bounds on the regret achievable by any algorithm with a given BAI failure probability. This analysis shows that in some regimes BoBW-lil'UCB(γ) achieves Pareto-optimality up to constant or small terms. Numerical experiments further demonstrate that when applied to difficult instances, BoBW-lil'UCB outperforms a close competitor UCB_α (Degenne et al., 2019), which is designed for RM and BAI with a fixed confidence.

READ FULL TEXT

page 8

page 26

research
05/30/2019

Multi-Objective Generalized Linear Bandits

In this paper, we study the multi-objective bandits (MOB) problem, where...
research
07/06/2023

Optimal Scalarizations for Sublinear Hypervolume Regret

Scalarization is a general technique that can be deployed in any multiob...
research
05/31/2023

Pareto Front Identification with Regret Minimization

We consider Pareto front identification for linear bandits (PFILin) wher...
research
07/02/2020

Gamification of Pure Exploration for Linear Bandits

We investigate an active pure-exploration setting, that includes best-ar...
research
09/01/2023

Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

This paper considers a stochastic multi-armed bandit (MAB) problem with ...
research
10/15/2020

Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions

We consider a best arm identification (BAI) problem for stochastic bandi...
research
12/27/2021

Tracking Most Severe Arm Changes in Bandits

In bandits with distribution shifts, one aims to automatically detect an...

Please sign up or login with your details

Forgot password? Click here to reset