Pure Exploration in Bandits with Linear Constraints

06/22/2023
by   Emil Carlsson, et al.
0

We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when the arms are subject to linear constraints. Unlike the standard best-arm identification problem which is well studied, the optimal policy in this case may not be deterministic and could mix between several arms. This changes the geometry of the problem which we characterize via an information-theoretic lower bound. We introduce two asymptotically optimal algorithms for this setting, one based on the Track-and-Stop method and the other based on a game-theoretic approach. Both these algorithms try to track an optimal allocation based on the lower bound and computed by a weighted projection onto the boundary of a normal cone. Finally, we provide empirical results that validate our bounds and visualize how constraints change the hardness of the problem.

READ FULL TEXT
research
12/27/2013

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

The paper proposes a novel upper confidence bound (UCB) procedure for id...
research
11/27/2022

Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget

We consider a constrained, pure exploration, stochastic multi-armed band...
research
05/10/2023

Best Arm Identification in Bandits with Limited Precision Sampling

We study best arm identification in a variant of the multi-armed bandit ...
research
07/16/2014

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

The stochastic multi-armed bandit model is a simple abstraction that has...
research
05/28/2021

Asymptotically Optimal Bandits under Weighted Information

We study the problem of regret minimization in a multi-armed bandit setu...
research
02/13/2022

On the complexity of All ε-Best Arms Identification

We consider the problem introduced by <cit.> of identifying all the ε-op...
research
03/15/2016

Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains

Sequential decision making under uncertainty is studied in a mixed obser...

Please sign up or login with your details

Forgot password? Click here to reset