Log In Sign Up

A Bad Arm Existence Checking Problem

by   Koji Tabata, et al.
Hokkaido University
The University of Tokyo

We study a bad arm existing checking problem in which a player's task is to judge whether a positive arm exists or not among given K arms by drawing as small number of arms as possible. Here, an arm is positive if its expected loss suffered by drawing the arm is at least a given threshold. This problem is a formalization of diagnosis of disease or machine failure. An interesting structure of this problem is the asymmetry of positive and negative (non-positive) arms' roles; finding one positive arm is enough to judge existence while all the arms must be discriminated as negative to judge non-existence. We propose an algorithms with arm selection policy (policy to determine the next arm to draw) and stopping condition (condition to stop drawing arms) utilizing this asymmetric problem structure and prove its effectiveness theoretically and empirically.


page 1

page 2

page 3

page 4


Sample Complexity of an Adversarial Attack on UCB-based Best-arm Identification Policy

In this work I study the problem of adversarial perturbations to rewards...

Gaussian Process Classification Bandits

Classification bandits are multi-armed bandit problems whose task is to ...

Detecting an Odd Restless Markov Arm with a Trembling Hand

In this paper, we consider a multi-armed bandit in which each arm is a M...

Learning to Detect an Odd Restless Markov Arm with a Trembling Hand

This paper studies the problem of finding an anomalous arm in a multi-ar...

Learning to Detect an Odd Markov Arm

A multi-armed bandit with finitely many arms is studied when each arm is...

The bias of the sample mean in multi-armed bandits can be positive or negative

It is well known that in stochastic multi-armed bandits (MAB), the sampl...

Predictive Sampling with Forecasting Autoregressive Models

Autoregressive models (ARMs) currently hold state-of-the-art performance...

1 Introduction

In the diagnosis of disease or machine failure, the test object is judged as “positive” if some anomaly is detected in at least one of many parts. In the case that the purpose of the diagnosis is the classification into two classes, “positive” and “negative”, then the diagnosis can be terminated right after the first anomaly part has been detected. Thus, fast diagnosis will be realized if one of anomaly parts can be detected as fast as possible in positive case.

The fast diagnosis of anomaly detection is particularly important in the case that the judgment is done based on measurements using a costly or slow device. For example, a Raman spectral image has been known to be useful for cancer diagnosis

(Haka et al.,, 2009), but its acquisition time is 1–10 seconds per point (pixel)111 resulting in an order of hours or days per one image (typically 10,000–40,000 pixels), so it is critical to measure only the points necessary for cancer diagnosis in order to achieve fast measurement. A Raman spectrum of each point is believed to be converted to a cancer index, which indicates how likely the point is inside a cancer cell, and we can judge the existence of cancer cells from the existence of area with a high cancer index.

The above cancer cell existence checking problem can be formulated as the problem of checking the existence of a grid with a high cancer index for a given area that is divided into grids. By regarding each grid as an arm, we formalize this problem as a loss-version of a stochastic -armed bandit problem in which the existence of positive arms is checked by drawing arms and suffering losses for the drawn arms. In our formulation, given an acceptable error rate and two thresholds and with and

, a player is required to, with probability at least

, answer “positive” if positive arms exist and “negative” if all the arms are negative. Here, an arm is defined to be positive if its loss mean is at least , and defined to be negative if its loss mean is less than . We call player algorithms for this problem as -BAEC (Bad Arm Existence Checking) algorithms. The objective of this research is to design a -BAEC algorithm that minimizes the number of arm draws, that is, an algorithm with the lowest sample complexity. The problem of this objective is said to be a Bad Arm Existence Checking Problem.

The bad arm existence checking problem is closely related to the thresholding bandit problem (Locatelli et al., 2016), which is a kind of pure-exploration problem such as the best arm identification problem (Even-Dar et al., 2006; Audibert et al.,, 2010). In the thresholding bandit problem, provided a threshold and a required precision

, the player’s task is to classify each arm into positive (its loss mean is at least

) or negative (its loss mean is less than ) by drawing a fixed number of samples, and his/her objective is to minimize the error probability, that is, the probability that positive (resp. negative) arms are wrongly classified into negative (resp. positive). Apart from whether fixed confidence (constraint on error probability to achieve) or fixed budget (constraint on the allowable number of draws), positive and negative arms are treated symmetrically in the thresholding bandit problem while they are dealt with asymmetrically in our problem setting; judgment of one positive arm existence is enough for positive conclusion though all the arms must be judged as negative for negative conclusion. This asymmetry has also been considered in the good arm identification problem (Kano et al., 2017), and our problem can be seen as its specialized version. In their setting, the player’s task is to output all the arms of above-threshold means with probability at least , and his/her objective is to minimize the number of drawn samples until arms are outputted as arms with above-threshold means for a given . In the case with , algorithms for their problem can be used to solve our existence checking problem. Their proposed algorithm, however, does not utilize the asymmetric problem structure. In this paper, we address the issue of how to utilize the structure.

We consider algorithms that are mainly composed of an arm-selection policy and a stopping condition. The arm-selection policy decides which arm is drawn at each time based on loss samples obtained so far. The stopping condition is used to judge whether the number of loss samples of each arm is enough to discriminate between positive and negative arms. If the currently drawn arm is judged as a positive arm, then the algorithms stop immediately by returning “positive”. In the case that the arm is judged as a negative arm, the arm is removed from the set of positive-arm candidates, which is composed of all the arms initially, and will not be drawn any more. If there remains no positive-arm candidate, then the algorithms stop by returning “negative”.

To utilize our asymmetric problem structure, we propose a stopping condition that uses -dependent asymmetric

confidence bounds of estimated loss means. Here, asymmetric bounds mean that the width of the upper confidence interval is narrower than the width of the lower confidence interval, and the algorithm using our stopping condition stops drawing each arm

if its lower confidence bound of the estimated loss is at least or its upper confidence bound is less than . As an arm selection policy, we propose policy that is derived by modifying policy APT (Locatelli et al., 2016) so as to favor arms with sample means larger than a single threshold (rather than arms with sample means closer to as the original APT does). Here, as the single threshold used by policy , we use not the center between and but the value closer to by utilizing the asymmetric structure of our problem.

By using -dependent asymmetric confidence bounds as the stopping condition, the worst-case bound on the number of samples for each arm is shown to be improved by compared to the case using the conventional stopping condition of the successive elimination algorithm (Even-Dar et al., 2006). Regarding the asymptotic behavior as , the upper bound on the expected number of samples for our algorithm with arm selection policy is proved to be almost optimal when all the positive arms have the same loss mean, which is the case that HDoC (Kano et al., 2017) does not perform well. Note that HDoC is an algorithm for good arm identification that uses (Auer and Cesa-Bianchi, 2002) as the arm selection policy. Our upper bound for does not depend on the existence of near-optimal arms unlike that for .

The effectiveness of our stopping condition using the -dependent asymmetric confidence bounds is demonstrated in simulation experiments. The algorithm using our stopping condition stopped drawing an arm about two times faster than the algorithm using the conventional stopping condition when its loss mean is around the center of the thresholds. Our algorithm with arm selection policy always stopped faster than the algorithm using arm selection policy UCB (Auer and Cesa-Bianchi, 2002) like HDoC (Kano et al., 2017), and our algorithm’s stopping time was faster or comparable to the stopping time of the algorithm using arm selection policy LUCB (Kalyanakrishnan et al., 2012) in our simulations using Bernoulli loss distribution with synthetically generated means and means generated from a real-world dataset.

2 Preliminary

For given thresholds , consider a following bandit problem. Let be the number of arms, and at each time , a player draws arm . For , denotes the loss for the th draw of arm , where

are a sequence of i.i.d. random variables generated according to a probability distribution

with mean . We assume independence between and for any with . For a distribution set of arms, and denote the expectation and the probability under , respectively, and we omit the subscript if it is trivial from the context. Without loss of generality, we can assume that and the player does not know this ordering. Let denote the number of draws of arm right before the beginning of the round at time . After the player observed the loss , he/she can choose stopping or continuing to play at time . Let denote the stopping time.

The player’s objective is to check the existence of some positive arm(s) with as small a stopping time as possible. Here, arm is said to be positive if , negative if , and neutral otherwise. We consider a bad arm existence checking problem, which is a problem of developing algorithms that satisfy the following definition with as small number of arm draws as possible.

Definition 1

Given222Thresholds and correspond to and , respectively, in thresholding bandit problem (Locatelli et al., 2016) with one threshold and precision , but we use the two thresholds due to convenience for our asymmetric problem structure. with and , consider a game that repeats choosing one of arms and observing its loss at each time . A player algorithm for this game is said to be a -BAEC (Bad Arm Existence Checking) algorithm if it stops in a finite time outputting “positive” with probability at least if at least one arm is positive, and “negative” with probability at least if all the arms are negative.

Note that the definition of BAEC algorithms requires nothing when arm is neutral. Table 1 is the table of notations used throughout this paper.

  : Number of arms. , : Upper and lower thresholds. () : Gray zone width . : Acceptable error rate. () : Loss distribution of arm . : Set of loss distributions of arms. : Loss mean (expected loss) of arm . () Arm is : Expectation of some random variable w.r.t. . : Probability of some event w.r.t. . ( is omitted when it is trivial from the context.) : Drawn arm at time . : Loss suffered by the th draw of arm . : Number of draws of arm at the beginning of the round at time . : Stopping time. : Number of draws of arm until algorithm ’s stopping condition ( or ) is satisfied. : First arm that is drawn times by algorithm  : Number of arms with . : Event that arm is judged as positive.  

Table 1: Notation List

3 Sample Complexity Lower Bound

In this section, we derive a lower bound on the expected number of samples needed for a -BAEC algorithm. The derived lower bound is used to evaluate algorithm’s sample complexity upper bound in Sec. 5.2 and Sec. 5.3.

We let

denote Kullback-Leibler divergence from distribution

to and define as

Note that holds if and

are Bernoulli distributions with means

and , respectively.

Theorem 3.1

Let be a set of Bernoulli distributions with means . Then, the stopping time of any -BAEC algorithm with and is bounded as


if some arm is positive, and


if all the arms are negative.


See Appendix A.

Remark 1

Identification is not needed for checking existence, however, in terms of asymptotic behavior as , the shown expected sample complexity lower bounds of both the tasks are the same; for both the tasks in the case with some positive arms. The bounds are tight considering the shown upper bounds, so the bad arm existence checking is not more difficult than the good arm identification (Kano et al., 2017) with respect to asymptotic behavior as .

4 Algorithm

Parameter Function:
: index value of arm at time for arm selection , : lower and upper confidence bounds of arm ’s estimated loss mean
Input: : the number of arms : thresholds with : acceptable error rate

2:for  do
3:      ,
4:end for
6:while  do
9:      Draw and suffer a loss .
11:      if  then
12:            return “positive” Arm is judged as pos.
13:      else if  then
14:             Arm is judged as neg.
15:      end if
17:end while
18:return “negative”
Algorithm 1

As -BAEC algorithms, we consider algorithm shown in Algorithm 1 that, at each time , chooses an arm from the set of positive-candidate arms by an arm-selection policy

using some index value of arm at time (Line 7), suffers a loss (Line 9) and then checks whether a stopping condition

is satisfied (Lines 11 and 13). Here, and are lower and upper confidence bounds of an estimated loss mean of the current drawn arm , and condition is the condition for stopping drawing any arm and outputting “positive”, and condition is the condition for stopping drawing arm concluding its negativity and removing from the set of positive-candidate arms of time . In addition to the case with outputting “positive”, algorithm also stops outputting “negative” when becomes empty.

Define sample loss mean of arm with draws as

and we use as an estimated loss mean of the current drawn arm at time . Thus, and are determined by defining lower and upper bounds of a confidence interval of for and .

As lower and upper confidence bounds of ,


respectively, are generally used333Precisely speaking, is used in successive elimination algorithms for best arm identification problem. A narrower confidence interval is enough to judge whether expected loss is larger than a fixed threshold. in successive elimination algorithms (Even-Dar et al., 2006). Define and as and for use as and .

In this paper, we propose asymmetric bounds and defined using a gray zone width as follows:



We also let and denote and using these bounds, that is, and .

The idea of our bounds are derived as follows. By using lower bound , is upper bounded by . This can be proved using Hoeffding’s Inequality and the union bound. The conventional bound uses decreasing sequence while our bound uses a constant sequence . Even though for such constant sequence , can be upper bounded by because stopping condition is satisfied for any arm and any , which is derived from Lemma 5.1 and Proposition 3. Note that depends on gray zone width , and the larger the is, the smaller the is. Our upper bound is closer to than , that is, the positions of and are not symmetric with respect to the position of . This is a reflection of our asymmetric problem setting. In the case with , any arm must not be judged as positive ( for some ) for correct conclusion, so the probability of wrongly judged as positive for each arm must be at most for the union bound. On the other hand, in the case with , correct judgment for arm is enough for correct conclusion, so the probability of wrongly judged as negative ( for some ) for each positive arm can be at most .

Note that holds for . Both and decrease as increases, and or is satisfied for and when they become at most for , where means that any index function can be assumed.

Remark 2

Condition essentially identifies non-negative arm . Is there real-valued function that can check existence of a non-negative arm without identifying it? The answer is yes. Consider a virtual arm at each time whose mean loss is a weighted average over the mean losses of all the arms () defined as . If , then at least one arm must be non-negative. Thus, we can check the existence of a non-negative arm by judging whether or not. Since defined as

can be considered to be a lower bound of the estimated value of , can be used as for checking the existence of a non-negative arm without identifying it.

The ratio of the width of our upper confidence interval to the width of our lower confidence interval is . Thus, we define as

This can be considered to be the balanced center between the thresholds and for our asymmetric confidence bounds.

As arm selection policy , we consider policy that uses index function


This arm-selection policy is a modification of the policy of (Anytime Parameter-free Thresholding algorithm) (Locatelli et al., 2016), in which an arm

is chosen for given threshold and accuracy . In the original APT, arm with the sample mean closest to is preferred to be chosen no matter whether is larger or smaller than . In , there is at most one arm whose sample mean is larger than at any time due to the initialization that for all arms , and such unique arm is always chosen as long as .

5 Sample Complexity Upper Bounds

In this section, we first analyze sample complexity of algorithm , then analyze sample complexity of algorithm .

We let denote the smallest number of draws of arm for which either or holds. We define as

and let denote for . We define event and as

Note that algorithm returns “positive” under the event and returns “negative” under the event . For any event , we let denote an indicator function of , that is, if occurs and otherwise.

5.1 Sample Complexity of Algorithm

In this subsection, we prove that algorithm is a -BAEC algorithm. We also show three upper bounds of the number of samples needed for algorithm : a worst-case bound, a high-probability bound and an average-case bound.

A worst-case upper bound on the number of samples is directly derived from the following theorem, which says, the number of draws for each arm can be upper bounded by constant number depending on and due to gray zone width .

Theorem 5.1

Inequality holds for .


See Appendix B.

How good is the worst case bound on the number of samples for each arm comparing to the case with and ? We know from the following theorem that, in , the number of arm draws for some arm can be larger than , which means if .

Theorem 5.2

Consider algorithm and define for . Then, event can happen for , where is defined as . Furthermore, the difference between the worst case stopping times is lower-bounded as


See Appendix C.

Remark 3

In the experimental setting of Sec. 6.1, in which parameters , and are used, the lower bounds of the difference between the worst case stopping times and calculated using the above inequality are and , respectively, which seem relatively large compared to corresponding and .

The following theorem states that algorithm is a -BAEC algorithm which needs at most samples in the worst case.

Theorem 5.3

Algorithm is a -BAEC algorithm that stops after at most arm draws.


See Appendix D.

A high-probability upper bound of the number of samples needed for algorithm is shown in the next theorem. Compared to worst case bound, can be improved to in the case with , however, only one is guaranteed to be improved to the maximum among those of positive arms in the case with .

Theorem 5.4

In algorithm , inequality holds for at least one positive arm with probability at least when . Inequality holds for all the arm with probability at least when . As a result, with probability at least , the stopping time of algorithm is upper bounded as when and when .


See Appendix E.

The last sample complexity upper bound for algorithm is an upper bound on the expected number of samples. Compared to the high-probability bound, is improved to or .

Theorem 5.5

For algorithm , the expected value of of each arm is upper bounded as follows.

As a result, the expected stopping time of algorithm is upper bounded as


The above theorem can be easily derived from the following lemma by setting event to a certain event (an event that occurs with probability ).

Lemma 1

For any event , in algorithm , inequality


holds for any arm with and