Optimal Best-Arm Identification in Bandits with Access to Offline Data

06/15/2023
by   Shubhada Agrawal, et al.
0

Learning paradigms based purely on offline data as well as those based solely on sequential online learning have been well-studied in the literature. In this paper, we consider combining offline data with online learning, an area less studied but of obvious practical importance. We consider the stochastic K-armed bandit problem, where our goal is to identify the arm with the highest mean in the presence of relevant offline data, with confidence 1-δ. We conduct a lower bound analysis on policies that provide such 1-δ probabilistic correctness guarantees. We develop algorithms that match the lower bound on sample complexity when δ is small. Our algorithms are computationally efficient with an average per-sample acquisition cost of Õ(K), and rely on a careful characterization of the optimality conditions of the lower bound problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2020

An Optimal Elimination Algorithm for Learning a Best Arm

We consider the classic problem of (ϵ,δ)-PAC learning a best arm where t...
research
10/17/2017

Good Arm Identification via Bandit Feedback

In this paper, we consider and discuss a new stochastic multi-armed band...
research
11/01/2022

Beyond the Best: Estimating Distribution Functionals in Infinite-Armed Bandits

In the infinite-armed bandit problem, each arm's average reward is sampl...
research
10/15/2020

Stochastic Bandits with Vector Losses: Minimizing ℓ^∞-Norm of Relative Losses

Multi-armed bandits are widely applied in scenarios like recommender sys...
research
07/24/2018

Decision Variance in Online Learning

Online learning has classically focused on the expected behaviour of lea...
research
11/03/2020

Multi-armed Bandits with Cost Subsidy

In this paper, we consider a novel variant of the multi-armed bandit (MA...
research
06/10/2022

Interactively Learning Preference Constraints in Linear Bandits

We study sequential decision-making with known rewards and unknown const...

Please sign up or login with your details

Forgot password? Click here to reset