1 Introduction
Motivations and background
Randomization and probabilistic methods are among the most widely used techniques in modern science, with applications ranging from mathematical economics to medicine or particle physics. One of the most successful probabilistic approaches is the Monte Carlo Simulation method for algorithm design, that relies on repeated random sampling and statistical analysis to estimate parameters and functions of interest. From Buffon’s needle experiment, in the eighteenth century, to the simulations of galaxy formation or nuclear processes, this method and its variations have become increasingly popular to tackle problems that are otherwise intractable. The Markov chain Monte Carlo method
[39] led for instance to significant advances for approximating parameters whose exact computation is #Phard [43, 41, 24, 40].The analysis of Monte Carlo Simulation methods is often based on concentration inequalities that characterize the deviation of a random variable from some parameter. In particular, the Chebyshev inequality is a key element in the design of randomized methods that estimate some target numerical value. Indeed, this inequality guarantees that the arithmetic mean of independent samples, from a random variable with variance and mean satisfying , is an approximation of under relative error
with high probability. This basic result is at the heart of many computational problems, such as counting via Markov chains
[39, 60], estimating graph parameters [20, 30, 33, 26], testing properties of classical [34, 10, 19, 16] or quantum [14, 9] distributions, approximating the frequency moments in the data stream model [4, 51, 6].Various quantum algorithms have been developed to speedup or generalize classical Monte Carlo methods (e.g. sampling the stationary distributions of Markovchains [61, 56, 23, 59, 21], estimating the expected values of observables or partition functions [45, 62, 56, 52]). The mean estimation problem (as addressed by Chebyshev’s inequality) has also been studied in the quantum sampling model. In this model, a distribution is represented by a unitary transformation (called a quantum sampler) preparing a superposition over the elements of the distribution, with the amplitudes encoding the probability mass function. A quantum sample is defined as one execution of a quantum sampler or its inverse. The number of quantum samples needed to estimate the mean of a distribution on a bounded space , with additive error , was proved to be [36, 12], or [52] given an upperbound on the variance. On the other hand, the mean estimation problem with relative error can be solved with quantum samples [13, 62]. Interestingly, this is a quadratic improvement over if the sample space is (this case maximizes the variance). Montanaro [52] posed the problem of whether this speedup can be generalized to other distributions. He assumed that one knows an upper bound^{1}^{1}1More precisely, is an upper bound on where is the second moment, which satisfies . on , and gave an algorithm using^{2}^{2}2We use the notation to indicate . quantum samples (thus improving the dependence on , compared to the classical setting). This result was reformulated in [47] to show that, knowing bounds , it is possible to use quantum samples. Typically, the only upperbound known on is , so it is less efficient than [13, 62].
Quantum Chebyshev Inequality
Our main contribution (Theorem 3.3 and Theorem A.2) is to show that the mean of any distribution with variance can be approximated with relative error using quantum samples, given an upper bound on and two bounds such that . This is an exponential improvement in compared to previous works [47]. Moreover, if is negligible, this is a quadratic improvement over the number of classical samples needed when using the Chebyshev inequality. If no bound is known, we also present an algorithm using quantum samples in expectation (Theorem 3.5). A corresponding lower bound is deduced from [55] (Theorem 4.1). We also show (Theorem 4.3) that no such speedup is possible if we only had access to copies of the quantum state representing the distribution.
Our algorithm is based on sequential analysis. Given a threshold , we will consider the “truncated” mean defined by replacing the outcomes larger than with . Using standard techniques, this mean can be encoded in the amplitude of some quantum state (Corollary 2.4). We then run the Amplitude Estimation algorithm of Brassard et al. [13] on this state for steps (i.e. with quantum samples), only to see whether the estimate of it returns is nonzero (this is our stopping rule). A property of this algorithm (Corollary 2.4 and Remark 2.7) guarantees that it is zero with high probability if and only if the number of quantum samples is below the inverse of the estimated amplitude. The crucial observation (Lemma 3.2) is that is smaller than for large values of , and it becomes larger than when . Thus, by repeatedly running the amplitude estimation algorithm with quantum samples, and doing steps of a logarithmic search on decreasing values of , the first nonzero value is obtained when is approximately equal to . The precision of the result is later improved, by using more precise “truncated” means.
This algorithm is extended (Theorem B.1) to cover the common situation where one knows a nonincreasing function such that , instead of having explicitly . For this purpose, we exhibit another property (Corollary 2.4 and Remark 2.6) of the amplitude estimation algorithm, namely that it always outputs a number smaller than the estimated value (up to a constant factor) with high probability. This shall be seen as a quantum equivalent of the Markov inequality. Combined with the previous algorithm, it allows us to find a value , with a second logarithmic search on .
Next, we study the quantum analogue of the following standard fact: classical samples, each taking average time to be computed, can be obtained in total average time . The notion of average time is adapted to the quantum setting, using the framework of variabletime algorithms introduced by Ambainis. We develop a variabletime amplitude estimation algorithm (Theorem C.2) that approximates the target value efficiently when some branches of the computation stop earlier than the others. It can be used in place of the standard amplitude estimation in all our results (Theorem C.3).
Applications
We describe two applications that illustrate the use of the above results. We first study the problem of approximating the frequency moments of order in the multipass streaming model with updates. Classically, the best pass algorithms with memory satisfy [51, 63]. We give a quantum algorithm for which (Theorem 5.3). This problem was studied before in [53], where the author obtained quantum speedups for , and , but no significant improvement for . Similar tradeoff results are known for Disjointness ( in the quantum streaming model [46] vs. classically), and Dyck(2) ( [54] vs. [50, 17, 38]).
Our construction starts with a classical onepass linear sketch streaming algorithm [51, 6] with memory , that samples (approximately) from a distribution with mean and variance . We implement it with a quantum sampler, that needs two passes for one quantum sample. The crucial observation (Appendix D) is that the reverse computation of a linear sketch algorithm can be done efficiently in one pass (whereas usually that would require processing the same stream but in the reverse direction).
As a second application, we study the approximation of graph parameters using neighbor, vertexpair and degree queries. We show that the numbers of edges and of triangles, in an vertex graph, can be estimated with (Theorem 5.4) and (Theorem 5.6) quantum queries respectively. This is a quadratic speedup over the best classical algorithms [33, 26]. The lower bounds (Theorems 5.5 and 5.7) are obtained with a property testing to communication complexity reduction method.
The number of edges is approximated by translating a classical estimator [58] into a quantum sampler. The triangle counting algorithm is more involved. We need a classical estimator [26] approximating the number of adjacent triangles to any vertex . Its average running time being small, we obtain a quadratic speedup for estimating (Proposition E.6) using our mean estimation algorithm for variabletime samplers. We then diverge from the classical triangle counting algorithm of [26], that requires to set up a data structure for sampling edges uniformly in the graph. This technique seems to be an obstacle for a quadratic speedup. We circumvent this problem by adapting instead a bucketing approach from [25] that partitions the graph’s vertices according to the value of . The size of each bucket is estimated using a second quantum sampler.
2 Preliminaries
2.1 Computational model
In this paper we consider probability distributions
on some finite sample spaces . We denote by the probability to sample in the distribution . We also make the assumption, which is satisfied for most of applications, that is equipped with an efficient encoding of its elements . In particular, we can perform quantum computations on the Hilbert space defined by the basis . Moreover, given any two values , we assume the existence of a unitary that can perform the Bernoulli sampling (see below) in time polylogarithmic in . In the rest of the paper we will neglect this complexity, including the required precision for implementing any of those unitary operators.Definition 2.1.
Given a finite space and two reals , an Bernoulli sampler over is a unitary acting on and satisfying for all :
We say that is Bernoulli samplable if any Bernoulli sampler can be implemented in polylogarithmic time in , when have polylogsize encodings in .
The operation can be implemented with a controlled rotation, and is reminiscent of related works on mean estimation (e.g. [62, 12, 52]). In what follows, we always use or .
We can now define what a quantum sample is.
Definition 2.2.
Given a finite Bernoulli samplable space and a distribution on , a (quantum) sampler for is a unitary operator acting on , for some Hilbert space , such that
where
are arbitrary unit vectors. A
quantum sample is one execution of or (including their controlled versions). The output of is the random variable obtained by measuring the register of . Its mean is denoted by , its variance by , and its second moment by .Given a nonnegative random variable and two numbers , we define the random variable where when and otherwise. If , we let . Similarly, where when and otherwise.
We motivate the use of a Bernoulli sampler by the following observation: for any sampler and values , the modified sampler acting on , where and
, generates the Bernoulli distribution
, of mean (see the proof of Corollary 2.4). This central result will be used all along this paper.Other quantum sampling models
Instead of having access to the unitary , one could only have copies of the state (as in [7] for instance). However, as we show in Theorem 4.3, the speedup presented in this paper is impossible to achieve in this model. On another note, Aharonov and TaShma [2] studied the Qsampling problem, which is the ability to prepare given the decription of a classical circuit with output distribution . This problem becomes straightforward if a garbage register can be added (using standard reversiblecomputation techniques). Bravyi, Harrow and Hassidim [14] considered an oraclebased model, that is provably weaker than Qsampling, where a distribution on is represented by an oracle (for some ), such that equals the proportion of inputs with . It is extended to the quantum query framework with a unitary such that . It is not difficult to see that applying on a uniform superposition gives , as required by Definition 2.2 (where ). Finally, Montanaro [52] presented a model that is similar to ours, where he replaced the register of with a
qubit register (for some
) combined with a mapping where is the sample associated to each .2.2 Amplitude estimation
The essential building block of this paper is the amplitude estimation algorithm [13], combined with ideas from [62, 12, 52], to estimate the modified mean of a quantum sampler to which a Bernoulli sampler has been applied. We will need the following result about amplitude estimation.
Theorem 2.3.
There is a quantum algorithm AmplEst, called Amplitude Estimation, that takes as input a unitary operator , an orthogonal projector , and an integer . The algorithm outputs an estimate of , where , such that
and satisfies . It uses qubit quantum gates (independent of and ) and makes calls to (the controlled versions of) and , and calls to the reflection .
Corollary 2.4.
Consider a quantum sampler and two values . Denote . Given an integer and a real , (see Algorithm 1) uses quantum samples and outputs satisfying all of the following inequalities with probability :
(1) ,  for any ;  (2) ,  for any ; 
(3) ,  when ;  (4) ,  when and . 
Proof.
We show that each satisfies the inequalities stated in the corollary, with probability . Since is the median of such values, the probability is increased to using the Chernoff bound.
For each , denote if , and otherwise. Since , observe that
where and are unit vectors. Thus, the output of the AmplEst algorithm applied on and is an estimate of satisfying the output conditions of Theorem 2.3. Therefore with probability , for any . By plugging into this inequality we have . By plugging we also have , and thus . Finally, if , denote such that and observe that (since , for ). The probability to obtain is , since is decreasing for . Moreover, when , the first two inequalities are obviously satisfied if . ∎
The four results on in Corollary 2.4 lie at the heart of this paper. We make a few comments on them.
Remark 2.5.
Consider a sampler over for the Bernoulli distribution of parameter . Using the Chebyshev inequality, we get that classical samples are enough for estimating with relative error . The inequality (4) of Corollary 2.4 shows that quantum samples are sufficient. Our main result (Section 3) generalizes this quadratic speedup to the nonBernoulli case.
Remark 2.6.
The inequality (2) shall be seen as an equivalent of the Markov inequality^{3}^{3}3The Markov inequality for a nonnegative random variable states that for any . Here, although we do not need this result, it is possible to prove that , for some absolute constant ., namely that does not exceed by a large factor with large probability. This property will be used in Appendix B.
Remark 2.7.
If , inequalities (3) and (4) imply that, with large probability, when , and when . This phenomenon, at , is crucially used in the next section.
3 Quantum Chebyshev’s inequality
We describe our main algorithm for estimating the mean of any quantum sampler , given an upper bound (we recall that and ). The two main tools used in this section are the BasicEst algorithm of Corollary 2.4, and the following lemma on “truncated” means. We recall that (resp. ) is defined from a nonnegative random variable by substituting the outcomes greater or equal to (resp. less than ) with . Note that for all .
Fact 3.1.
For any random variable and numbers , we have and .
Lemma 3.2.
Let be a nonnegative random variable and . Then, for all such that , we have
Proof.
The left hand side term is a consequence of and (using Fact 3.1). The right hand side term is a direct consequence of the left one, and of the hypothesis . ∎
Our mean estimation algorithm works in two stages. We first compute a rough estimate with quantum samples (where are known bounds on ). Then, we improve the accuracy of the estimate to any value , at extra cost .
Theorem 3.3.
If and then the output of Algorithm 2 satisfies with probability . Moreover, for any it satisfies with probability . The number of quantum samples used by the algorithm is .
Proof.
Assume that and . We denote . By Lemma 3.2, if then , and if then . Therefore, by Corollary 2.4, with probability , the value computed at Step 2.(b) is equal to when , and is different from when . Thus, the first time Step 2.(b) of Algorithm 2 computes happens for , with probability at least .
Consequently, we can assume that Step 4 is executed with , and we let . According to Lemma 3.2 we have and , where . Thus, according to Corollary 2.4, the value satisfies with probability . Using the triangle inequality, it implies .
If , this may only increase the probability to stop at Step 3 and output . If Step 4 is executed, we still have with probability , as a consequence of Corollary 2.4. ∎
Remark 3.4.
If and , observe that the output of Algorithm 2 satisfies when and when , with probability .
We show in Appendix A (Algorithm 5) how to modify the last step of Algorithm 2 so that it uses quantum samples only (Theorem A.2). Using Remark 3.4, we also remove the input parameter while keeping the number of quantum samples small in expectation (Algorithm 6). Altogether, it leads to the following result.
Theorem 3.5.
There is an algorithm that, given a sampler , an integer , a value , and two reals , outputs an estimate . If and , it satisfies with probability , and the algorithm uses quantum samples in expectation.
In Section 4, we describe an lower bound for this mean estimation problem. Before, we present three kinds of generalizations of the above algorithms.

Time complexity and variabletime samplers. The time complexity (number of quantum gates) of all above algorithms is essentially equal to the number of quantum samples multiplied by the time complexity of the considered sampler. Often, this last quantity is much larger than the more desirable average running time defined by Ambainis [5] in the context of variabletime amplitude amplification. In Appendix C, we develop a new variabletime amplitude estimation algorithm (Theorem C.2), and we use it into our above algorithm to show that can be estimated in time (Theorem C.3).
4 Optimality and separation results
Using a result due to Nayak and Wu [55] on approximate counting, we can show a corresponding lower bound to Theorem 3.5 already in the simple case of Bernoulli variables. For this purpose, we define that an algorithm solves the Mean Estimation problem for parameters if, for any sampler satisfying (the constant 4 is arbitrary), it outputs a value satisfying with probability .
Theorem 4.1.
Any algorithm solving the Mean Estimation problem for parameters and on the sample space must use quantum samples.
Proof.
Consider an algorithm solving the Mean Estimation problem for parameters , using quantum samples. Take two integers large enough such that and . For any oracle , define the quantum sampler and let . Observe that , and one quantum sample from can be implemented with one quantum query to .
According to [55, Corollary 1.2], any algorithm that can distinguish from makes quantum queries to . However, given the promise that or we can use with input , , to distinguish between the two cases using samples, that is queries to . Indeed, for such samplers (since ). Thus, must use quantum samples. ∎
One may wonder whether the quantum speedup presented in this paper holds if we only have access to copies of a quantum state (instead of access to a unitary preparing it). Below we answer this question negatively. For this purpose, we define that an algorithm solves the statebased Mean Estimation problem for parameters if, using access to some copies of an unknown state satisfying (where and ), it outputs a value satisfying with probability .
Lemma 4.2.
Consider two distributions represented by the quantum states and . The smallest integer needed to be able to discriminate and with success probability satisfies , where is the KLdivergence from to .
Proof.
According to Helstrom’s bound [37] the best success probability to discriminate two states and is . Consequently, must satisfy , which implies
where we used the concavity of the function. ∎
Theorem 4.3.
Any algorithm solving the statebased Mean Estimation problem for parameters and on the sample space must use copies of the input state.
Proof.
Consider an algorithm solving the statebased Mean Estimation problem for parameters , using copies of the input state. Given any with (notice that and ), we show how to construct a state such that
It is clear that can be used to discriminate two such states. On the other hand, according to Lemma 4.2, any such algorithm muse use copies of the input state.
The construction of is adapted from [22, Section 7]. We set where and (so that ). We let (resp. ) denote the first (resp. second) derivative of with respect to . A simple calculation shows that and . Moreover,