1 Introduction
The stochastic multiarmed bandit is one of the most popular and wellstudied models for capturing the explorationexploitation tradeoffs in many application domains. There is a huge body of literature on numerous bandit models from several fields including stochastic control, statistics, operation research, machine learning and theoretical computer science. The basic stochastic multiarmed bandit model consists of
stochastic arms with unknown distributions. One can adaptively take samples from the arms and make decision depending on the objective. Popular objectives include maximizing the cumulative sum of rewards, or minimizing the cumulative regret (see e.g., [CesaBianchi and Lugosi(2006), Bubeck et al.(2012)Bubeck, CesaBianchi, et al.]).In this paper, we study another classical multiarmed bandit model, called pure exploration model, where the decisionmaker first performs a pureexploration phase by sampling from the arms, and then identifies an optimal (or nearly optimal) arm, which serves as the exploitation phase. The model is motivated by many application domains such as medical trials [Robbins(1985), Audibert and Bubeck(2010)], communication network [Audibert and Bubeck(2010)], online advertisement [Chen et al.(2014)Chen, Lin, King, Lyu, and Chen], crowdsourcing [Zhou et al.(2014)Zhou, Chen, and Li, Cao et al.(2015)Cao, Li, Tao, and Li]. The best arm identification problem (BestArm) is the most basic pure exploration problem in stochastic multiarmed bandits. The problem has a long history (first formulated in [Bechhofer(1954)]) and has attracted significant attention since the last decade [Audibert and Bubeck(2010), EvenDar et al.(2006)EvenDar, Mannor, and Mansour, Mannor and Tsitsiklis(2004), Jamieson et al.(2014)Jamieson, Malloy, Nowak, and Bubeck, Karnin et al.(2013)Karnin, Koren, and Somekh, Chen and Li(2015), Carpentier and Locatelli(2016), Garivier and Kaufmann(2016)]. Now, we formally define the problem and set up some notations.
Definition 1.1
BestArm: We are given a set of arms . Arm has a reward distribution with an unknown mean
. We assume that all reward distributions are Gaussian distributions with unit variance. Upon each play of
, we get a reward sampled i.i.d. from . Our goal is to identify the arm with the largest mean using as few samples as possible. We assume here that the largest mean is strictly larger than the second largest (i.e., ) to ensure the uniqueness of the solution, where denotes the th largest mean.Remark 1.2
Some previous algorithms for BestArm take a sequence (instead of a set) of arms as input. In this case, we may simply assume that the algorithm randomly permutes the sequence at the beginning. Thus the algorithm will have the same behaviour on two different orderings of the same set of arms.
Remark 1.3
For the upper bound, everything proved in this paper also holds if the distributions are 1subGaussian, which is a standard assumption in the bandit literature. On the lower bound side, we need to assume that the distributions are from some family parametrized by the means and satisfy certain properties. See Remark D.4. Otherwise, it is possible to distinguish two distributions using 1 sample even if their means are very close. We cannot hope for a nontrivial lower bound in such generality.
The BestArm problem for Gaussian arms was first formulated in [Bechhofer(1954)]. Most early works on BestArm did not analyze the sample complexity of the algorithms (they proved their algorithms are correct though). The early advances are summarized in the monograph [Bechhofer et al.(1968)Bechhofer, Kiefer, and Sobel].
For the past two decades, significant research efforts have been devoted to understanding the optimal sample complexity of the BestArm problem. On the lower bound side, [Mannor and Tsitsiklis(2004)] proved that any correct algorithm for BestArm takes samples in expectation. In fact, their result is an instancewise lower bound (see Definition 1.6). [Kaufmann et al.(2015)Kaufmann, Cappé, and Garivier] also provided an lower bound for BestArm, which improved the constant factor in [Mannor and Tsitsiklis(2004)]. [Garivier and Kaufmann(2016)] focused on the asymptotic sample complexity of BestArm as the confidence level approaches zero (treating the gaps as fixed), and obtained a complete resolution of this case (even for the leading constant).^{1}^{1}1In contrast, our work focus on the situation that both and all gaps are variables that tend to zero. In fact, if we let the gaps (i.e., ’s) tend to while maintaining fixed, their lower bound is not tight. [Chen and Li(2015)] showed that for each there exists a BestArm instance with arms that require samples, which further refines the lower bound.
The algorithms for BestArm have also been significantly improved in the last two decades [EvenDar et al.(2002)EvenDar, Mannor, and Mansour, Gabillon et al.(2012)Gabillon, Ghavamzadeh, and Lazaric, Kalyanakrishnan et al.(2012)Kalyanakrishnan, Tewari, Auer, and Stone, Karnin et al.(2013)Karnin, Koren, and Somekh, Jamieson et al.(2014)Jamieson, Malloy, Nowak, and Bubeck, Chen and Li(2015), Garivier and Kaufmann(2016)]. [Karnin et al.(2013)Karnin, Koren, and Somekh] obtained an upper bound of
The same upper bound was obtained by [Jamieson et al.(2014)Jamieson, Malloy, Nowak, and Bubeck] using a UCBtype algorithm called lil’UCB. Recently, the upper bound was improved to
by [Chen and Li(2015)]. There is still a gap between the best known upper and lower bound.
To understand the sample complexity of BestArm, it is important to study a special case, which we term as SIGN. The problem can be viewed as a special case of BestArm where there are only two arms, and we know the mean of one arm. SIGN will play a very important role in our lower bound proof.
Definition 1.4
SIGN: is a fixed constant. We are given a single arm with unknown mean . The goal is to decide whether or . Here, the gap of the problem is defined to be . Again, we assume that the distribution of the arm is a Gaussian distribution with unit variance.
In this paper, we are interested in algorithms (either for BestArm or for SIGN) that can identify the correct answer with probability at least . This is often called the fixed confidence setting in the bandit literature.
Definition 1.5
For any , we say that an algorithm for BestArm (or SIGN) is correct, if on any BestArm (or SIGN) instance, returns the correct answer with probability at least .
1.1 Almost Instancewise Optimality Conjecture
It is easy to see that no function (only depending on and ) can serve as an upper bound of the sample complexity of BestArm (with arms and confidence level ). Instead, the sample complexity depends on the gaps. Intuitively, the smaller the gaps are, the harder the instance is (i.e., more samples are required). Since the gaps completely determine an instance (for Gaussian arms with unit variance, up to shifting), we use ’s as the parameters to measure the sample complexity.
Now, we formally define the notion of instancewise lower bounds and instance optimality.For algorithm and instance , we use to denote the expected number of samples taken by on instance .
Definition 1.6 (Instancewise Lower Bound)
For a BestArm instance and a confidence level , we define the instancewise lower bound of as
We say a BestArm algorithm is instance optimal, if it is correct, and for every instance , .
Now, we consider the BestArm problem from the perspective of instance optimality. Unfortunately, even for the twoarm case, no instance optimal algorithm may exist. In fact, [Farrell(1964)] showed that for any correct algorithm for SIGN, we must have
This implies that any correct algorithm requires samples in the worst case. Hence, the upper bound of for SIGN is generally not improvable. However, for a particular SIGN instance with gap , there is an correct algorithm that only needs samples for this instance, implying . See [Chen and Li(2015)] for details.
Despite the above fact, [Chen and Li(2016)] conjectured that the twoarm case is the only obstruction toward an instance optimal algorithm. Moreover, based on some evidence from the previous work [Chen and Li(2015)], they provided an explicit formula and conjecture that can be expressed by the formula. Interestingly, the formula involves an entropy term (similar entropy terms also appear in [Afshani et al.(2009)Afshani, Barbay, and Chan] for completely different problems). In order to state Chen and Li’s conjecture formally, we define the entropy term first.
Definition 1.7
Given a BestArm instance and , let
We can view
as a discrete probability distribution. We define the following quantity as the
gap entropy of instance :Remark 1.8
We choose to partition the arms based on the powers of . There is nothing special about the constant , and replacing it by any other constant only changes by a constant factor.
Conjecture 1.9 (GapEntropy Conjecture (Chen and Li(2016)))
There is an algorithm for BestArm with sample complexity
for any instance and . And we say such an algorithm is almost instancewise optimal for BestArm. Moreover,
Remark 1.10
As we mentioned before, the term is sufficient and necessary for distinguishing the best and the second best arm, even though it is not an instanceoptimal bound. The gap entropy conjecture states that modulo this additive term, we can obtain an instance optimal algorithm. Hence, the resolution of the conjecture would provide a complete understanding of the sample complexity of BestArm (up to constant factors). All the previous bounds for BestArm agree with Conjecture 1.9, i.e., existing upper (lower) bounds are no smaller (larger) the conjectured bound. See [Chen and Li(2016)] for details.
1.2 Our Results
In this paper, we make significant progress toward the resolution of the gapentropy conjecture. On the upper bound side, we provide an algorithm that almost matches the conjecture.
Theorem 1.11
There is a correct algorithm for BestArm with expected sample complexity
Our algorithm matches the main term in Conjecture 1.9. For the additive term (which is typically small), we lose a factor. In particular, for those instances where the additive term is times smaller than the main term, our algorithm is optimal.
On the lower bound side, despite that we are not able to completely solve the lower bound, we do obtain a rather strong bound. We need to introduce some notations first. We say an instance is discrete, if the gaps of all the suboptimal arms are of the form for some positive integer . We say an instance is a subinstance of an instance , if can be obtained by deleting some suboptimal arms from . Formally, we have the following theorem.
Theorem 1.12
For any discrete instance , confidence level , and any correct algorithm for BestArm, there exists a subinstance of such that
where is a universal constant.
We say an algorithm is monotone, if for every and such that is a subinstance of . Then we immediately have the following corollary.
Corollary 1.13
For any discrete instance , and confidence level , for any monotone correct algorithm for BestArm, we have that
where is a universal constant.
We remark that all previous algorithms for BestArm have monotone sample complexity bounds. The above corollary also implies that if an algorithm has a monotone sample complexity bound, then the bound must be on all discrete instances.
2 Related Work
Sign and A/B testing.
In the A/B testing problem, we are asked to decide which arm between the two given arms has the larger mean. A/B testing is in fact equivalent to the SIGN problem. It is easy to reduce SIGN to A/B testing by constructing a fictitious arm with mean . For the other direction, given an instance of A/B testing, we may define an arm as the difference between the two given arms and the problem reduces to SIGN where . In particular, our refined lower bound for SIGN stated in Lemma 4.1 also holds for A/B testing. [Kaufmann et al.(2015)Kaufmann, Cappé, and Garivier, Garivier and Kaufmann(2016)] studied the limiting behavior of the sample complexity of A/B testing as the confidence level approaches to zero. In contrast, we focus on the case that both and the gap tend to zero, so that the complexity term due to not knowing the gap in advance will not be dominated by the term.
BestArm.
The BestArm problem, in which we are required to identify the arms with the largest means, is a natural extension of BestArm. BestArm has been extensively studied in the past few years [Kalyanakrishnan and Stone(2010), Gabillon et al.(2011)Gabillon, Ghavamzadeh, Lazaric, and Bubeck, Gabillon et al.(2012)Gabillon, Ghavamzadeh, and Lazaric, Kalyanakrishnan et al.(2012)Kalyanakrishnan, Tewari, Auer, and Stone, Bubeck et al.(2013)Bubeck, Wang, and Viswanathan, Kaufmann and Kalyanakrishnan(2013), Zhou et al.(2014)Zhou, Chen, and Li, Kaufmann et al.(2015)Kaufmann, Cappé, and Garivier, Chen et al.(2017)Chen, Li, and Qiao], and most results for BestArm are generalizations of those for BestArm. As in the case of BestArm, the sample complexity bounds of BestArm depend on the gap parameters of the arms, yet the gap of an arm is typically defined as the distance from its mean to either or (depending on whether the arm is among the best arms or not) in the context of BestArm problem. The Combinatorial Pure Exploration problem, which further generalizes the cardinality constraint in BestArm (i.e., to choose exactly arms) to general combinatorial constraints, was also studied [Chen et al.(2014)Chen, Lin, King, Lyu, and Chen, Chen et al.(2016)Chen, Gupta, and Li, Gabillon et al.(2016)Gabillon, Lazaric, Ghavamzadeh, Ortner, and Bartlett].
PAC learning.
The sample complexity of BestArm and BestArm in the probably approximately correct (PAC) setting has also been well studied in the past two decades. For BestArm, the tight worstcase sample complexity bound was obtained by [EvenDar et al.(2002)EvenDar, Mannor, and Mansour, Mannor and Tsitsiklis(2004), EvenDar et al.(2006)EvenDar, Mannor, and Mansour]. [Kalyanakrishnan and Stone(2010), Kalyanakrishnan et al.(2012)Kalyanakrishnan, Tewari, Auer, and Stone, Zhou et al.(2014)Zhou, Chen, and Li, Cao et al.(2015)Cao, Li, Tao, and Li] also studied the worst case sample complexity of BestArm in the PAC setting.
3 Preliminaries
Throughout the paper, denotes an instance of BestArm (i.e., is a set of arms). The arm with the largest mean in is called the optimal arm, while all other arms are suboptimal. We assume that every instance has a unique optimal arm. denotes the arm in with the th largest mean, unless stated otherwise. The mean of an arm is denoted by , and we use as a shorthand notation for (i.e., the th largest mean in an instance). Define as the gap of arm , and let denote the gap of arm . We assume that to ensure the optimal arm is unique.
We partition the suboptimal arms into different groups based on their gaps. For each , group is defined as . For brevity, let and denoted and respectively. The complexity of arm is defined as , while the complexity of instance is denoted by (or simply , if the instance is clear from the context). Moreover, denotes the total complexity of the arms in group . naturally defines a probability distribution on , where the probability of is given by . The gapentropy of the instance is then denoted by
Here and in the following, we adopt the convention that .
4 A Sketch of the Lower Bound
4.1 A Comparison with Previous Lower Bound Techniques
We briefly discuss the novelty of our new lower bound technique, and argue why the previous techniques are not sufficient to obtain our result. To obtain a lower bound on the sample complexity of BestArm, all the previous work [Mannor and Tsitsiklis(2004), Chen et al.(2014)Chen, Lin, King, Lyu, and Chen, Kaufmann et al.(2015)Kaufmann, Cappé, and Garivier, Garivier and Kaufmann(2016)] are based on creating two similar instances with different answers, and then applying the change of distribution method (originally developed in [Kaufmann et al.(2015)Kaufmann, Cappé, and Garivier]) to argue that a certain number of samples are necessary to distinguish such two instances. The idea was further refined by [Garivier and Kaufmann(2016)]. They formulated a maxmin game between the algorithm and some instances (with different answers than the given instance) created by an adversary. The value of the game at equilibrium would be a lower bound of the samples one requires to distinguish the current instance and several worst adversary instances. However, we notice that even in the twoarm case, one cannot prove the lower bound by considering only one maxmin game to distinguish the current instance from other instance. Roughly speaking, the factor is due to not knowing the actual gap , and any lower bound that can bring out the factor should reflect the union bound paid for the uncertainty of the instance. In fact, for the BestArm problem with arms, the gap entropy term exists for a similar reason (not knowing the gaps). Hence, any lower bound proof for BestArm that can bring out the term necessarily has to consider the uncertainty of current instance as well (in fact, the random permutation of all arms is the kind of uncertainty we need for the new lower bound). In our actual lower bound proof, we first obtain a very tight understanding of the SIGN problem (Lemma 4.1).^{3}^{3}3Farrell’s lower bound [Farrell(1964)] is not sufficient for our purpose. Then, we provide an elegant reduction from SIGN to BestArm, by embedding the SIGN problem to a collection of BestArm instances.
4.2 Proof of Theorem 1.12
Following the approach in [Chen and Li(2015)], we establish the lower bound by a reduction from SIGN to discrete BestArm instances, together with a more refined lower bound for SIGN stated in the following lemma.
Lemma 4.1
Suppose , and is a correct algorithm for SIGN. is a probability distribution on defined by . denotes the Shannon entropy of distribution . Let denote the expected number of samples taken by when it runs on an arm with distribution and . Define . Then,
It is well known that to distinguish the normal distribution
from , samples are required. Thus, denotes the ratio between the expected number of samples taken by and the corresponding lower bound, which measures the “loss” due to not knowing the gap in advance. Then Lemma 4.1 can be interpreted as follows: when the gap is drawn from a distribution , the expected loss is lower bounded by the sum of the entropy of and . We defer the proof of Lemma 4.1 to Appendix D.Now we prove Theorem 1.12 by applying Lemma 4.1 and an elegant reduction from SIGN to BestArm. [Proof of Theorem 1.12] Let be the hidden constant in the big in Lemma 4.1, i.e.,
We claim that Theorem 1.12 holds for constant .
Suppose towards a contradiction that is a correct (for some ) algorithm for BestArm and is a discrete instance, while for all subinstance of ,
Recall that and denote the complexity and entropy of instance , respectively.
Construct a distribution of SIGN instances.
Let be the number of arms in with gap , and be the greatest integer such that . Since is discrete, the complexity of instance is given by
Let . Then defines a distribution on . Moreover, the Shannon entropy of distribution is exactly the entropy of instance , i.e., . Our goal is to construct an algorithm for SIGN that violates Lemma 4.1 on distribution .
A family of subinstances of .
Let be the set of “types” of arms that are present in . We consider the following family of instances obtained from . For , define as the instance obtained from by removing exactly one arm of gap for each . Note that is a subinstance of .
Let denote , the complement of set relative to . For and , let denote the expected number of samples taken on all the arms with gap when runs on . Define . We note that is the expected number of samples taken on every arm with gap in instance .^{4}^{4}4 Recall that a BestArm algorithm is defined on a set of arms, so the arms with identical means in the instance cannot be distinguished by . See Remark 1.2 for details.
We have the following inequality:
(1) 
The second step holds because the lefthand side only counts part of the samples taken by . The last step follows from our assumption and the fact that is a subinstance of .
Construct algorithm from .
Now we define an algorithm for SIGN with . Given an arm , we first choose a set uniformly at random from all subsets of . Recall that denotes the mean of the optimal arm in . runs the following four algorithms through in parallel:

Algorithm simulates on .

Algorithm simulates on .

Algorithm simulates on .

Algorithm simulates on .
More precisely, when one of the four algorithms requires a new sample from (or ), we draw a sample from arm , feed to and , and then feed to and . Note that the samples taken by the four algorithms are the same up to negation and shifting.
terminates as soon as one of the four algorithms terminates. If one of and identifies as the optimal arm, or one of and identifies an arm other than as the optimal arm, outputs “”; otherwise it outputs “”.
Clearly, is correct if all of through are correct, which happens with probability at least . Note that since , the condition of Lemma 4.1 is satisfied.
Upper bound the sample complexity of .
The crucial observation is that when and , effectively simulates the execution of on . In fact, since all arms are Gaussian distributions with unit variance, the arm is the same as an arm with gap in the original BestArm instance. Recall that the number of samples taken on each of the arms with gap in instance is . Therefore, the expected number of samples taken on is upper bounded by .^{5}^{5}5 Recall that if terminates after taking samples from , the number of samples taken by on is also (rather than ). Likewise, when and , is equivalent to the execution of on , and thus the expected number of samples on is less than or equal to . Analogous claims hold for the case and algorithms and as well.
It remains to compute the expected loss of on distribution and derive a contradiction to Lemma 4.1. It follows from a simple calculation that
The first step follows from our discussion on algorithm . The third step renames the variables and rearranges the summation. The last line applies (1). This leads to a contradiction to Lemma 4.1 and thus finishes the proof.
5 Warmup: BestArm with Known Complexity
To illustrate the idea of our algorithm for BestArm, we consider the following simplified yet still nontrivial version of BestArm: the complexity of the instance, , is given, yet the means of the arms are still unknown.
5.1 Building Blocks
We introduce some subroutines that are used throughout our algorithm.
Uniform sampling.
The first building block is a uniform sampling procedure, , which takes samples from each arm in set . Let be the empirical mean of arm (i.e., the average of all sampled values from ). It obtains an approximation of the mean of each arm with probability . The following fact directly follows by the Chernoff bound.
Fact 5.1
takes samples. For each arm , we have
We say that a call to procedure returns correctly, if holds for every arm . Fact 5.1 implies that when , the probability of returning correctly is at least .
Median elimination.
[EvenDar et al.(2002)EvenDar, Mannor, and Mansour] introduced the Median Elimination algorithm for the PAC version of BestArm. returns an arm in with mean at most away from the largest mean. Let denote the largest mean among all arms in . The performance guarantees of MedElim is formally stated in the next fact.
Fact 5.2
takes samples. Let be the arm returned by MedElim. Then
We say that returns correctly, if it holds that .
Fraction test.
Procedure decides whether a sufficiently large fraction (compared to thresholds and ) of arms in have small means (compared to thresholds and ). The procedure randomly samples a certain number of arms from
and estimates their means using
UnifSampl. Then it compares the fraction of arms with small means to the thresholds and returns an answer accordingly. The detailed implementation of FracTest is relegated to Appendix A, where we also prove the following fact.Fact 5.3
takes samples, where and . With probability , the following two claims hold simultaneously:

If FracTest returns True, .

If FracTest returns False, .
We say that a call to procedure FracTest returns correctly, if both the two claims above hold; otherwise the call fails.
Elimination.
Finally, procedure eliminates the arms with means smaller than threshold from . More precisely, the procedure guarantees that at most a fraction of arms in the result have means smaller than . On the other hand, for each arm with mean greater than , with high probability it is not eliminated. We postpone the pseudocode of procedure Elimination and the proof of the following fact to Appendix A.
Fact 5.4
takes samples in expectation, where . Let denote the set returned by . Then with probability at least ,
Moreover, for each arm with , we have
We say that a call to Elimination returns correctly if both and hold; otherwise the call fails. Here denotes the arm with the largest mean in set . Fact 5.4 directly implies that procedure Elimination returns correctly with probability at least .
5.2 Algorithm
Now we present our algorithm for the special case that the complexity of the instance is known in advance. The KnownComplexity algorithm takes as its input a BestArm instance , the complexity of the instance, as well as a confidence level . The algorithm proceeds in rounds, and maintains a sequence of arm sets, each of which denotes the set of arms that are still considered as candidate answers at the beginning of round .
Roughly speaking, the algorithm eliminates the arms with gaps at the th round, if they constitute a large fraction of the remaining arms. Here is the accuracy parameter that we use in round . To this end, KnownComplexity first calls procedures MedElim and UnifSampl to obtain , which is an estimation of the largest mean among all arms in up to an error. After that, FracTest is called to determine whether a large proportion of arms in have gaps. If so, FracTest returns True, and then KnownComplexity calls the Elimination procedure with carefully chosen parameters to remove suboptimal arms from .
with complexity and risk . The best arm. ; ;
Instanceto return the only arm in ;
; ;
;
;
;
; ;
The following two lemmas imply that there is a correct algorithm for BestArm that matches the instancewise lower bound up to an additive term.^{6}^{6}6 Lemma 5.6 only bounds the number of samples conditioning on an event that happens with probability , so the algorithm may take arbitrarily many samples when the event does not occur. However, KnownComplexity can be transformed to a correct algorithm with the same (unconditional) sample complexity bound, using the “parallel simulation” technique in the proof of Theorem 1.11 in Appendix C.
Lemma 5.5
For any BestArm instance and , returns the optimal arm in with probability at least .
Lemma 5.6
For any BestArm instance and , conditioning on an event that happens with probability , takes
samples in expectation.
5.3 Observations
We state a few key observations on KnownComplexity, which will be used throughout the analysis. The proofs are exactly identical to those of Observations A.3 through A.5 in Appendix A. The following observation bounds the value of at round , assuming the correctness of UnifSampl and MedElim.
Observation 5.7
If UnifSampl returns correctly at round , . Here denotes the largest mean of arms in . If both UnifSampl and MedElim return correctly, .
The following two observations bound the thresholds used in FracTest and Elimination by applying Observation 5.7.
Observation 5.8
At round , let and denote the two thresholds used in FracTest. If UnifSampl returns correctly, . If both MedElim and UnifSampl return correctly, .
Observation 5.9
Let and denote the two thresholds used in Elimination. If UnifSampl returns correctly, . If both MedElim and UnifSampl return correctly, .
5.4 Correctness
We define as the event that all calls to procedures UnifSampl, FracTest, and Elimination return correctly. We will prove in the following that KnownComplexity returns the correct answer with probability conditioning on , and . Note that Lemma 5.5 directly follows from these two claims.
Event implies correctness.
It suffices to show that conditioning on , KnownComplexity never removes the best arm, and the algorithm eventually terminates. Suppose that . Observation 5.9 guarantees that at round , the upper threshold used by Elimination is smaller than or equal to . By Fact 5.4, the correctness of Elimination guarantees that .
It remains to prove that KnownComplexity terminates conditioning on . Define . Suppose is the smallest integer greater than such that MedElim returns correctly at round .^{7}^{7}7 MedElim returns correctly with probability at least in each round, so is welldefined with probability . By Observation 5.9, the lower threshold in Elimination is greater than or equal to . The correctness of Elimination implies that
It follows that . Therefore, the algorithm terminates either before or at round .
happens with high probability.
We first note that at round , the probability that either UnifSampl or FracTest fails (i.e., returns incorrectly) is at most . By a union bound, the probability that at least one call to UnifSampl or FracTest returns incorrectly is upper bounded by
It remains to bound the probability that Elimination fails at some round, yet procedures UnifSampl and FracTest are always correct. Define as the probability that, given the value of at the beginning of round , at least one call to Elimination returns incorrectly in round or later, yet UnifSampl and FracTest always return correctly. We prove by induction that for any that contains the optimal arm ,
(2) 
where and
The details of the induction are postponed to Appendix E.
Observe that and
Therefore we conclude that
which completes the proof of correctness. Here the first step applies a union bound. The second step follows from inequality (2), and the third step plugs in and .
5.5 Sample Complexity
As in the proof of Lemma 5.5, we define as the event that all calls to procedures UnifSampl, FracTest, and Elimination return correctly. We prove that KnownComplexity takes
samples in expectation conditioning on .
Samples taken by UnifSampl and FracTest.
In the proof of correctness, we showed that conditioning on , the algorithm does not terminate before or at round (for ) implies that MedElim fails between round and round , which happens with probability at most . Thus for , the expected number of samples taken by UnifSampl and FracTest at round is upper bounded by
Summing over all yields the following upper bound:
Comments
There are no comments yet.