Good Arm Identification via Bandit Feedback

10/17/2017
by   Hideaki Kano, et al.
0

In this paper, we consider and discuss a new stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem that an agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the exploration-exploitation dilemma of confidence, while best arm identification does not. Therefore, GAI is not just an extension of the best arm identification. Actually, an efficient design of algorithms for GAI is quite different from that for best arm identification. We derive a lower bound on the sample complexity for GAI and develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that the proposed algorithm outperforms a naive algorithm and a thresholding-bandit-like algorithm in synthetic settings and in settings based on medical data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2023

Differential Good Arm Identification

This paper targets a variant of the stochastic multi-armed bandit proble...
research
03/13/2018

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

We consider the problem of near-optimal arm identification in the fixed ...
research
08/29/2023

Pure Exploration under Mediators' Feedback

Stochastic multi-armed bandits are a sequential-decision-making framewor...
research
03/14/2023

Best arm identification in rare events

We consider the best arm identification problem in the stochastic multi-...
research
06/15/2023

Optimal Best-Arm Identification in Bandits with Access to Offline Data

Learning paradigms based purely on offline data as well as those based s...
research
02/09/2019

Pure Exploration with Multiple Correct Answers

We determine the sample complexity of pure exploration bandit problems w...
research
02/14/2012

Fractional Moments on Bandit Problems

Reinforcement learning addresses the dilemma between exploration to find...

Please sign up or login with your details

Forgot password? Click here to reset