1 Introduction
The goal of adaptive sampling is to estimate some unknown property
about the world, using as few measurements from a set of possible measurement actions ^{1}^{1}1We only work with finitely many measurements, but this may be generalized as in [1]. At each time step , a learner chooses a measurement action based on past observations, and recieves an observation . We assume that the observations are drawn i.i.d from a distribution over, which is unknown to the learner. In particular, the vector of distributions
, called the instance, encodes the distribution of all possible measurement actions. The instance can be thought of as describing the state of the world, and that our property of interest is a function of the instance. We focus on what is called the fixedconfidence pureexploration setting, where the algorithm decides to stop at some (possibly random) time , and returns an output which is allowed to differ fromwith probability at most
on any instance . Since is exactly equal to the number of measurements taken, the goal of adaptive pureexploration problems is to design algorithms for which is as small as possible, either in expectation or with high probability.Crucially, we often expect the instance to lie in a known constraining set . This allows us to encode a broad range of problems of interest as pureexploraton multiarm bandit () problems [2, 3] with structural constraints. As an example, the adaptive linear prediction problem of [4, 5] (known in the literature as linear bandits), is equivalent to , subject to the constraint that the mean vector (where ) lies in the subspace spanned by the rows of , where are the vectorvalued features associated with arms through
. The noisy combinatorial optimization problems of
[6, 7, 8] can be also be cast in this fashion. Moreover, by considering properties other than the top mean, one can use the the above framework to model signal recovery and compressed sensing [1, 9], subsetselection [10], and additional variants of combinatorial optimization [11, 12, 13].The purpose of this paper is to present new machinery to better understand the consequences of structural constraints , and types of objectives on the sample complexity of adaptive learning problems. This paper presents bounds for some structured adaptive sampling problems which charactecterize the sample complexity in the regime where the probability of error is a moderately small constant (e.g. , or even inversepolynomial in the number of measurements). In constrast, prior work has adressed the sample complexity of adaptive samplings problems in the asymptotic regime that , where such problems often admit algorithms whose asymptotic dependence on matches lower bounds for each groundtruth instance, even matching the exact instancedependent leading constant [14, 15, 16]. Analogous asymptoticallysharp and instancespecific results (even for structured problems) also hold in the regret setting where the time horizon [17, 8, 18, 19, 20].
The upper and lower bounds in this paper demonstrate that the asymptotics can paint a highly misleading picture of the true sample complexity when is nottoosmall. This occurs for two reasons:

Asymptotic characterizations of the sample complexity of adaptive estimation problems occur on a time horizon where the learner can learn an optimal measurement allocation tailored to the ground truth instance . In the short run, however, learning favorable measurement allocations is extremeley costly, and the allocation requires considerably more samples to learn than it itself would prescribe.

Asymptotic characterizations are governed by the complexity of discriminating the ground truth
from any single, alternative hypothesis. This neglects the sorts multiplehypothesis and supremaofempiricalprocess effects that are ubiquitous in highdimensional statistics and learning theory (e.g. those reflected in Fanostyle bounds).
To understand these effects, we introduce a new framework for analyzing adaptive sampling called the “Simulator”. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one, given any limited amount of data collected up to any given time. Our framework allows us to characterize granular, instance dependent properties that any successful adaptive learning algorithm must have. In particular, these insights inspire a new, theoretically nearoptimal, and practically stateoftheart algorithm for the topk subset selection problem. We emphasize that the Simulator framework is concerned with how an algorithm samples, rather than its final objective. Thus, we believe that the techniques in this paper can be applied more broadly to a wide class of problems in the active learning community.
2 Preliminaries
As alluded to in the introduction, the adaptive estimation problems in this paper can be formalized as multiarm bandits problems, where the instances lie in an appropriate constraint set , called an instance class (e.g., the mean vectors , where lie in some specified polytope). We use the term arms to refer both to the indices and distributions they index. The stochastic multiarm bandit formulation has been studied extensively in the pureexploration setting considered in this work [2, 3, 10, 21, 22, 23, 14, 15]. At each time , a learner plays an action , and observes an observation drawn i.i.d from . At some time , the learner decides to end the game and return some output. Formally, let denote the sigmaalgebra generated by , and some additional randomness independent of all the samples (this represents randomization internal to the algorithm). A sequential sampling algorithm consists of

A sampling rule , where is measurable

A stopping time , which is measurable

An output rule , which is measurable.
We let denote the samples collected from arm by time . In particular, is the number of times arm is pulled by the algorithm before terminating, and . A algorithm corresponds to the case where the decision rule is a singleton , and, more generally, a algorithm specifies a . We will use as a variable which describes a particular algorithm, and use the notation and to denote probabilities and expectations which are taken with respect to the samples drawn from , and the (possibly randomized) sampling, stopping, and output decisions made by . Finally, we adopt the following notion of correctness, which corresponds to the “fixedconfidence” setting in the active learning literature:
Definition 1.
Fix a set of instances . We say that a algorithm is correct for a bestarm mapping over the class of instances , if for all , . We say that a algorithm is correct for a top mapping , .
Typically, the best arm mapping is defined as the arm with the highest mean , and top mapping as the arms with the largest means , which captures the notion of the arm/set of arms that yield the highest reward. When the bestarm mapping returns the highestmean arm, and the observations are subGaussian^{2}^{2}2Formally, is subGaussian if , the problem complexity for is typically parameterized in terms of the “gaps” between the means [24]. More generally, sample complexity is parametrized in terms of the , the divergences between the measures and . For ease of exposition, we will present our highlevel contributions in terms of gaps, but the body of the work will also present more general results in terms of ’s. Finally, our theorem statements will use and to denote inequalities up to constant factors. In the text, we shall occasionally use more informally, hiding doublylogarithmic factors in problem parameters.
3 Statements of Lower Bound Results
Typically, lower bounds in the bandit and adaptive sampling literature are obtained by the change of measure technique [24, 9, 14]. To contextualize our findings, we begin by stating the stateoftheart changemeasurelower bounds, as it appears in [25]. For a class of instances , let denote the set of instances such that, . Then:
Proposition 1 (Theorem 1 [14]).
If is correct for all , then the expected number of samples collects under , , is bounded below by the solution to the following optimization problem
(1) 
where , which scales like as .
The above proposition says that the expected sample complexity is lower bounded by the following, nonadaptive experiment design problem: minimize the total number of samples
subject to the constraint that these samples can distinguish between a null hypothesis
, and any alternative hypothesis for, with TypeI and TypeII errors at most
. We will call the optimization problem in Equation 1 the Oracle Lower Bound, because it captures the best sampling complexity that could be attained by a powerful “oracle” who knows how to optimally sample under .Unlike the oracle, a real learner would never have access to the true instance . Indeed, for instances with sufficient structure, Equation 1 gives a misleading view of the instrinsic difficulty of the problem. For example, let denote the class of instances where , and lies in the simplex, i.e. and . If the ground truth instance has for some , then any oracle which uses the knowledge of the ground truth to construct a sampling allocation can simply put all of its samples on arm . Indeed, the simplex constraint implies that is indeed the best arm of , and that any instance which has a best arm other than must have . Thus, for all , . In other words, the sampling vector
(2) 
is feasible for Equation 1 which means that the optimal number of samples predicted by Equation 1 is no more than . But this predicted sample complexity doesn’t depend on the number of arms!
So how how hard is the simplex really? To adress this question, we prove the first lower bound in the literature which, to the author’s knowledge ^{3}^{3}3while this manuscript was on one of it’s authors websites, the same result was obtained independently by [26], accurately characterizes the complexity a strictly easier problem: when the means are known up to a permutation. Because the theorem holds when the measures are known up to a permutation, it also holds in the more general setting when the measures satisfy any permutationinvariant constraints, including when a) the means lie on the simplex b) the means lie in an ball or c) the vector of sorted means satisfy arbitrary constraints (e.g. weighted constraints on the sorted means [27]).
In what follows, let denote the group of permutations on elements and denote the index which is mapped to under . For an instance , we let , and define the instance class . Moreover, we use the notation to denote that is drawn uniformly at random. With this notation, is the number of times we pull the arm indexed by , i.e. the samples from . And is the expected number of samples from since , not the distribution . The following theorem essentially says that if the instance is randomly permuted before the start of the game, no correct algorithm can avoid taking a substantial number of samples from for any .
Theorem 1 (Lower bounds on Permutations).
Let be an instance with unique best arm , and for , define . If is correct over then
(3) 
for any , and by Markov’s inequality
(4) 
In particular, if is correct, then .
When the reward distributions are , (recall . In this setting, applying the oracle bound of Proposition 1 to permutations implies a lower bound of . Combining this bound with Theorem 1 yields that
(5) 
For comparison, the bound Proposition 1 only implies a lower bound of , since an oracle who knows how to sample could place all their samples on . Thus, for constant , ourlower bound differs from the bound in Proposition 1 by up to a factor of , the number of arms. In particular, when the gaps are all on the same order, the asymptotics only paint an accurate picture of the sample complexity once is exponentiallysmall in .
In fact, our lower bound is essentially unimproveable: Section A.1 provides an upper bound for the setting where the toptwo means are known, whose expected sample complexity on any permutation matches the onaverage complexity in Equation 5 up to constant and doublylogarithmic factors. Together, these upper and lower bounds depict two very different regimes:

As the , an algorithm which knows the means up to a permutation can learn to optimistically and agressively focus its samples on the top arm when the means are, yielding an asymptotic sample complexity predicted by Proposition 1, one which is potentially far smaller than that of the unconstrained problem.^{4}^{4}4In fact, using a trackandstop strategy similar to [14] one could design an algorithm which matches the constant factor in Propostion 1
These two regimes show that the Simulator and oracle lower bounds are complementary, and go after two different aspects of problem difficulty: In the second regime, the oracle lower bound characterizes samples sufficient to verify that arm is the best, whereas in the first regime, the Simulator characterizes the samples needed to learn a favorable sampling allocation^{5}^{5}5The simulator also provides a lower bound on the tail of the number of pulls from a suboptimal arm since, with probability , arm is pulled times. This shows that even though you can learn an oracle allocation on average, there is always a small risk of oversampling. Such affects do not appear from Proposition 1, which only control the number of samples taken in expectation. We remark that [25] also explores the problem of learningtosample by establishing the implications of Proposition 1 for finitetime regret; however, there approach does not capture any effects which aren’t reflected in Proposition 1. Finally, we note that proving a lower bound for learning a favorable strategy in our setting must consider some sort of average or worstcase over the instances. Indeed, one could imagine an algorithm that starts off by pulling the first arm until it has collected enough samples to test whether (i.e. ), and then pulling arm to test whether , and so on. If arm is the best, this algorithm can successfully identify it without pulling any of the others, thereby matching the oracle lower bound.
3.1 Sharper MultipleHypothesis Lower Bounds
In contrast to the oracle lower bounds, the active PAC learning literature (e.g., binary classification) leverages classical tools like Fano’s inequality with packing arguments [28, 29] and other measures of class complexity such as the disagreement coefficient [30]. Because these arguments consider multiple hypotheses simultaneously, they can capture effects which the worstcase binaryhypothesis oracle lower bounds like Equation 1 can miss, and the considerable gap between twoway and multiple tests is wellknown in the passive setting [31]. Unfortunately, existing techniques which capture this multiplehypothesis complexity lead to coarse, worst or averagecase lower bounds for adaptive problems because they rely on constructions which are either artificially symmetric, or are highly pessimistic [28, 29, 10]. Moreover, the constructions rarely shed insights on why active learning algorithms seem to avoid paying the costs for multiple hypotheses that would occur in the passive setting, e.g. the folk theorem: “active learning removes log factors” [9].
As a first step towards understanding these effects, we prove the first instancebased lower bound which sheds light on why active learning is able to effective reduce the number of hypotheses it needs to distinguish. To start, we prove a qualitative result for a simplified problem, using a novel reduction to Fano’s inequality via the simulator:
Theorem 2.
Let be correct, consider a game with best arm and arms of measure . Let . Then
(6) 
For Gaussian rewards with unit variance,
, where is the gap between the means , the above proposition states that, for any , any correct algorithm must sample some arms, including the top arm, times. Thus, the number of samples allocated by the oracle of Proposition 1 are necessarily insufficient to identify the best arm for moderate . This is because, until sufficiently many samples has been taken, one cannot distinguish between the best arm, and other arm exhibiting large statistical deviations. Looking at exponentialgap style upper bounds [23, 21], which halve the number of arms in consideration at each round, we see that our lower bound is qualitatively sharp for some algorithms^{6}^{6}6We believe that UCBstyle algorithms exhibit this same qualitative behavior. Further, we emphasize that this set of arms which must be pulled times may be random^{7}^{7}7In fact, for an algorithm with which only samples arms , this subset of arms must be random. This is because for a fixed subset of arms, one could apply Theorem 2 to the remaining arms., depend on the random fluctations in the samples collected, and thus cannot be determined using knowledge of the instance alone. Stated otherwise, if one sampled according to the proporitions as ascribed by Proposition 1, then the total number of samples one would need to collect would be suboptimal (by a factor of ). Thus, effective adaptive sampling should adapt its allocation to the statistical deviations in the collected data, not just the ground truth instance. We stress that the Simulator is indepensible for establishing this result, because it lets us characterize the stagewise sampling allocation of adaptive algorithms.Guided by this intuition, a more sophisticated proof strategy establishes the following guarantee for with Gaussian rewards (a more general result for singleparameter exponential families is stated in Theorem 5):
Proposition 2 (Lower Bound for Gaussian ).
Supppose has measures ,with . Then, if is correct over ,
(7) 
In particular, when all the gaps are on the same order , then the top arm must be pulled times. When the gaps are different, trades off between larger factor as the inversegapsquared shrinks. As we explain in Section D.1, this tradeoff best understood in the sense that the algorithm is conducting an instancedependent union bound bound, where the union bound places more confidence on means closer to the top. The proof itself is quite involved, and constitutes the main technical contribution of this paper. We devote Section D.1 to explaining the intuition and proof roadmap. Our argument makes use of “tilted distributions”, which arise in Herbst Argument in LogSobolev Inequalities in the concentrationofmeasure literature [32]. Tiltings translate the tendency of some empirical means to deviate far above their averages (i.e. to anticoncentrate) into a precise informationtheoretic statement that they “look like” draws from the top arm. To the best of our knowledge, this consitutes the first use of tiltings to establish informationtheoretic lower bounds, and we believe this technique may have broader use.
3.2 InstanceSpecific Lower bound for
Propostion 2 readily implies the first instancespecific lower bound for the . The idea is that, if I can identify an arm as one of the top arms, then, in particular, I can identify arm as the best arm among . Similarly, if I can reject arm as not part of the top , then I can identify it as the “worst” arm among . Section E formally proves the following lower bound using this reduction:
Proposition 3 (Lower Bound for Gaussian ).
Supppose has measures , with . Then, if is correct over ,
(8) 
By taking and in the first and second lines of 8, our result recovers the gapdependent bounds of [10] and [16] . Moreover, when the gaps are on the same order , we recover the worstcase lower bound from [10] of .
3.2.1 Comparison with [26]
After a manuscript of the present work was posted on one of its author’s websites, [26] presented an alternative proof of Proposition 3, also by a reduction to . Instead of tiltings, their argument handles different gaps by a series of careful reductions to a symmetric problem, to which they apply Proposition 1. As in this paper, their proof hinges on a “simulation” argument which compares the behavior of an algorithm on an instance to a run of an algorithm where the reward distributions change midgame. This seems to suggest that our simulator framework is in some sense a natural tool for these sorts of lower bounds.
While our works prove many of the same results, our papers differ considerably in emphasis.The goal for in this work is to explain why algorithms must incur the sample complexities that they do, rather than just sharpen logarithmic factors. In this vein, we establish Theorem 2, which has no analogue in [26]. Moreover, we believe that the proof of Proposition 2 based on tiltings is a step towards novel lower bounds for more sophisticated problems by translating intuitions about largedeviations into precise, informationtheoretic statements. Further still, our Theorem 1 (and Proposition 7 in the appendix) imply lower bounds on the taildeviations of the number of times suboptimal arms need to be sampled in constrained problems (see footnote 5).
4 Lucb++
The previous section showed that for in the worst case, the bottom arms must be pulled in proportion to times while the top arms must be pulled in proportion to times. Inspired by these new insights, the original LUCB algorithm of [10], and the analysis of [22] for the setting, in this section we propose a novel algorithm for : LUCB++. The LUCB++ algorithm proceeds exactly like that of [10], the only difference being the definition of the confidence bounds used in the algorithm.
At each round , let denote the empirical mean of all the samples from arm collected so far. Let be an anytime confidence bound based on the law of the iterated logarithm (see Kaufmann et al. [33, Theorem 8] for explicit constants). Finally, we let denote the set of the arms with the largest empirical means. The algorithm is outlined in Figure 1, and satisfies the following guarantee:
(9) 
Theorem 3.
Suppose that is subgaussian. Then, for any , the LUCB++ algorithm is correct, and the stopping time satisfies
with probability at least , where is a universal constant.
By Propositions 3 we recognize that when the gaps are all the same the sample complexity of the LUCB++ algorithm is unimprovable up to factors. This is the first practical algorithm that removes extraneous log factors on the suboptimal arms [10, 12]. However, it is known that not all instances must incur a multiplicative on the top arms [12, 26]. Indeed, when this problem is just the bestarm identification problem and the sample complexity of the above theorem, ignoring doubly logarithimc factors, scales like . But there exist algorithms for this particular bestarm setting whose sample complexity is just exposing a case where Theorem 3 is loose [21, 22, 23, 12]. In general, this additional factor is unnecessary on the top arms when , but for large , this is a case unlikely to be encountered in practice.
While this manuscript was in preparation, [26] proposed a algorithm which satisfies stronger theoretical guarantees, essentially matching the lower bound in Theorem 3. However, their algorithm of (and the matroidbandit algorithm of [12])relies on exponentialgap elimination, making it unsuitable for practical use^{8}^{8}8While exponentialgap elimination algorithms might have the correct dependence on problem parameters, their constantfactors in the sample complexity are incredibly high, because they rely on the medianelimination as a subroutine (see [22] for discussion). Furthermore, our improved LUCB++ confidence intervals can be reformulated for different KLdivergences, leading to tighter bounds for nonGaussian rewards such as Bernoullis. Moreover, we can “plugin” our LUCB++ confidence intervals into other LUCBstyle algorithms, sharpening their factors. For example, one could ammend the confidence intervals in the CLUCB algorithm of [11] for combinatorial bandits, which would yield slight improvements for arbitrary decision classes, and nearoptimal bounds for matroid classes considered in [12].
LUCB++  LUCB  Oracle  Uniform  
1.0  0.99  1.60  1.67  
1.0  1.17  2.00  3.4  
1.0  1.50  2.51  5.32  
1.0  1.89  2.90  7.12  
1.0  2.09  3.32  8.49 
To demonstrate the effectiveness of our new algorithm we compare to a number of natural baselines: LUCB of [10], a version of the oracle strategy of [14], and uniform sampling; all three use the stopping condition of [10] which is when the empirical top confidence bounds^{9}^{9}9To avoid any effects due to the particular form of the anytime confidence bound used, we use the same finitetime lawoftheiterated logarithm confidence bound used in [33, Theorem 8] for all of the algorithms. do not overlap with the bottom , employing a union bound over all arms. Consider a instance for constructed with unitvariance Gaussian arms with for and otherwise. Table 1 presents the average number of samples taken by the algorithms before reaching the stopping criterion, relative to the the number of samples taken by LUCB++. For these means, the oracle strategy pulls each arm a number of times proportional to where for and for ( for all when ). Note that the uniform strategy is idenitical to the oracle strategy, but with for all .
5 Lower Bounds via The Simulator
As alluded to in the introduction, our lower bounds treat adaptive sampling decisions made by the algorithm as hypothesis tests between different instances . Using a type of gadget we call a Simulator, we reduce lower bounds on adaptive sampling strategies to a family of lower bounds on different, possibly datadependent and timespecific nonadaptive hypothesis testing problems.
The Simulator acts as an adversarial channel intermediating between the algorithm , and i.i.d samples from the true instance . Given an instance , let denote a random transcript of an infinite sequence of samples drawn i.i.d from , where . We can think of sequential sampling algorithms as operating by iteracting with the transcript, where the sample is obtained by reading the sample off from (recall that is the number of times arm has been pulled at the end of round ). With this notation, we define a simulator as follows:
Definition 2 (Simulator).
A simulator is a map which sends to a modified transcript , which will interact with instead of (Figure 1). We allow this mapping to depend on the ground truth and some internal randomness .
Equivalently, is a measure on an random process , which, unlike , does not require the samples to be i.i.d (or even independent). Hence, we use the shorthand to refer the measure corresponding to , and let denote the probability taken with respect to ’s modified transcript , and the internal randomness in and . With this notation, the quantities and are well defined as the and divergences of the random process under the measures and .
Note that, in general, if for some , since (resp ) govern infinite i.i.d sequence (resp ). However, in this paper we will always design our simulator so that , and is in fact quite small. The hope is that if the modified transcript conveys too little information to distinguish between and , then will have to behave similarly on both simulated instances. Hence, we will show that if behaves differently on two instances and , yet limits information between them, then ’s behavior must differ quite a bit under versus , for either or . Formally, we will show that will have to “break” the simulator, in the following sense:
Definition 3 (Breaking).
Given measure , algorithm , and simulator , we say that is a truthful event under if, for all events ,
(10) 
Moreover, will say that is breaks on under . Recall that is the algebra generated by , and the actions/samples collected by up to time .
The key insight is that, whenever doesn’t break (i.e. on a truthful event ), a run of on can be perfectly simulated by running on . But if, fudges in a way that drastically limits information about , this means that can be simulated using little information about , which will contradict information theoretic lower bounds. This suggests the following recipe for proving lower bounds:
1) State a claim you wish to falsify over a class of instances (e.g., the best arm is not pulled more than times, with some probability ). 2) Phrase your claims as candidate truthful events on each instance (e.g. where is the best arm of ) 3) Construct a simulator such that is truthful on , but (or ) is small for alternative pairs . For example, if the truthful event is , then simulator should only modify samples . 4) Apply an informationtheoretic lower bound (e.g., Proposition 4 to come) to show that the simulator breaks (e.g. is large for at least one , or for a drawn uniformly from )
6 Applying the Simulator to Permutations
In what follows, we show how to use the simulator to prove Theorem 1. At a high level, our lower bound follows from considering pairs of instances where the best arm is swappedout for a suboptimal arm, and ultimately averaging over those pairs. On each such pair, we apply a version of Le Cam’s method to the simulator setup (proof in Section B.1):
Proposition 4 (Simulator Le Cam).
Let and be two measures, be a simulator, and let be two truthful events under for . Then, for any algorithm
(11) 
where . The bound also holds with replaced by .
Note that Equation 11 decouples the behavior of the algorithm under from the information limited by the simulator. This proposition makes formal the intuition from Section 5 that the algorithm which behaves differently on two distinct instances must “break” a simulator which severly limits the information between them.
6.1 Lower Bounds on 1Arm Swaps
The key step in proving Theorem 1 is to establish a simple lower bound that holds for pairs of instances obtained by “swapping” the best arm.
Proposition 5.
Let be an instance with unique best arm . For , let be the instance obtained by swapping and , namely , , and for . Then, if is correct, one has that for any
(12) 
where
This bound implies that, if an instance is drawn uniformly from , then any correct algorithm has to pull the suboptimal arm, namely the distribution , at least times on average, with probability . Proving this proposition requires choosing an appropriate simulator. To this end, fix a , and let map to such that,
(13) 
where for and , the means that the samples are taken independently of everything else (in particular, independent of and ), using internal randomness . We emphasize depends crucially on , , and .
Note that the only entries of whose distribution differs under and are just the first entries from arms and , namely . Hence, by a dataprocessing inequality
(14) 
Using the notation of Proposition 4, let , , let and (i.e, under and , you sample the suboptimal arm no greater than times). Now, Proposition 5 now follows immediately from Proposition 4, elementary manipulations, and the following claim:
Claim 1.
For and defined above, is truthful on under .
Proof of Claim 1.
The samples and have the sample distribution under and for and , by construction. Moreover, the samples and for are also i.i.d draws from , so they have the same distribution as the samples and under and respectively. Thus, the only samples whose distributions are changed by the simulator are the samples under and under , respectively, which never acesses under under and , respectively. ∎
6.2 Proving Theorem 1 from Proposition 5
Theorem 1 can be proven directly using the machinery established thus far. However, we will introduce a reduction to “symmetric algorithms” which will both expedite the proof of the Theorem 1, and come in handy for additional bounds as well. For a transcript , let denote the transcript , and denote probability taken w.r.t. the randomness of acting on the fixed (deterministic) transcript .
Definition 4 (Symmetric Algorithm).
We say that an algorithm is symmetric if the distribution of its sampling sequence and output commutes with permutations. That is, for any permutation , transcript , sequence of actions , and output ,