 # Quantum Sub-Gaussian Mean Estimator

We present a new quantum algorithm for estimating the mean of a real-valued random variable obtained as the output of a quantum computation. Our estimator achieves a nearly-optimal quadratic speedup over the number of classical i.i.d. samples needed to estimate the mean of a heavy-tailed distribution with a sub-Gaussian error rate. This result subsumes (up to logarithmic factors) earlier works on the mean estimation problem that were not optimal for heavy-tailed distributions [BHMT02,BDGT11], or that require prior information on the variance [Hein02,Mon15,HM19]. As an application, we obtain new quantum algorithms for the (ϵ,δ)-approximation problem with an optimal dependence on the coefficient of variation of the input random variable.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The problem of estimating the mean of a real-valued random variable given i.i.d. samples from it is one of the most basic tasks in statistics and in the Monte Carlo method. The properties of the various classical mean estimators are well understood. The standard non-asymptotic criterion used to assess the quality of an estimator is formulated as the following high probability deviation bound: upon performing random experiments that return samples from

, and given a failure probability

, what is the smallest error such that the output  of the estimator satisfies with probability at most ? Under the standard assumption that the unknown random variable  has a finite variance , the best possible performances are obtained by the so-called sub-Gaussian estimators [LM19j] that achieve the following deviation bound

 Pr⎡⎣|˜μ−μ|>L√σ2log(1/δ)n⎤⎦≤δ (1)

for some constant

. The term “sub-Gaussian” reflects that these estimators have a Gaussian tail even for non-Gaussian distributions. The most well-known sub-Gaussian estimator is arguably the

median-of-means [NY83b, JVV86j, AMS99j], which consists of partitioning the samples into roughly groups of equal size, computing the empirical mean over each group, and returning the median of the obtained means.

The process of generating a random sample from is generalized in the quantum model by assuming the existence of a unitary operator where coherently encodes the distribution of . A quantum experiment is then defined as one application of this operator or its inverse. The celebrated quantum amplitude estimation algorithm [BHMT02j] provides a way to estimate the mean of any Bernoulli random variable by performing fewer experiments than with any classical estimator. Yet, for general distributions, the existing quantum mean estimators either require additional information on the variance [Hei02j, Mon15j, HM19c] or are less performant than the classical sub-Gaussian estimators when the distribution is heavy tailed [BHMT02j, Ter99d, BDGT11p, Mon15j]. These results leave open the existence of a general quantum speedup for the mean estimation problem. We address this question by introducing the concept of quantum sub-Gaussian estimators, defined through the following deviation bound

 (2)

for some constant . We give the first construction of a quantum estimator that achieves this bound up to a logarithmic factor in . Additionally, we prove that it is impossible to go below that deviation level. This result provides a clear equivalent of the concept of sub-Gaussian estimator in the quantum setting.

A second important family of mean estimators addresses the -approximation problem, where given a fixed relative error and a failure probability the goal is to output a mean estimate such that

 Pr[|˜μ−μ|>ϵ|μ|]≤δ. (3)

The aforementioned sub-Gaussian estimators do not quite answer this question since the number of experiments they require (respectively and ) depends on the unknown quantities and . Sometimes a good upper bound is known on the coefficient of variation and can be used to parametrize a sub-Gaussian estimator. Otherwise, the standard approach is based on sequential analysis techniques, where the number of experiments is chosen adaptively depending on the results of previous computations. Given a random variable distributed in , the optimal classical estimators perform random experiments in expectation [DKLR00j] for computing an -approximation of . We construct a quantum estimator that reduces this number to and we prove that it is optimal.

### 1.1 Related work

There is an extensive literature on classical sub-Gaussian estimators and we refer the reader to [LM19j, Cat12j, BCL13j, DLLO16j, LV20p] for an overview of the main results and recent improvements. We point out that the empirical mean estimator is not sub-Gaussian, although it is optimal for Gaussian random variables [SV05j, Cat12j]. The non-asymptotic performances of the empirical mean estimator are captured by several standard concentration bounds such as the Chebyshev, Chernoff and Bernstein inequalities.

There is a series of quantum mean estimators [Gro98c, AW99p, BDGT11p] that get close to the bound for any random variable distributed in and some constant . Similar results hold for numerical integration problems [AW99p, Nov01j, Hei02j, TW02j, Hei03j]. The amplitude estimation algorithm [BHMT02j, Ter99d] leads to a sharper bound of (see Proposition 4.1) when is distributed in . Nevertheless, the quantity is always larger than or equal to the variance . The question of improving the dependence on was considered in [Hei02j, Mon15j, HM19c]. The estimators of [Hei02j, Mon15j] require to know an upper bound

on the standard deviation

, whereas [HM19c] needs an upper bound on the coefficient of variation (for non-negative random variables). The performances of these estimators are captured (up to logarithmic factors) by the deviation bound given in Equation (2) with replaced by and respectively.

The -approximation problem has been addressed by several classical works such as [DKLR00j, MSA08c, GNP13j, Hub19j]. In the quantum setting, there is a variant [BHMT02j, Theorem 15] of the amplitude estimation algorithm that performs experiments in expectation to compute an -approximate of the mean of a random variable distributed in (see Theorem A.4 and Proposition 5.2). However, the complexity of this estimator does not scale with . Given an upper bound on , the estimator of [HM19c] can be used to compute an -approximate with roughly quantum experiments if the random variable is non-negative.

We note that the related problem of estimating the mean with additive error , that is , has also been considered by several authors. The optimal number of experiments is classically [CEG95j] and quantumly [NW99c] (with failure probability ). These bounds do not depend on unknown parameters (as opposed to the relative error case), thus sequential analysis techniques are unnecessary here. Montanaro [Mon15j] also described an estimator that performs quantum experiments given an upper bound on the standard deviation .

### 1.2 Contributions and organization

We first formally define the input model in Section 2. We introduce the concept of “q-random variable” (Definition 2.3) to describe a random variable that corresponds to the output of a quantum computation. We measure the complexity of an algorithm by counting the number of quantum experiments (Definition 2.4) it performs with respect to a q-random variable.

We construct a quantum algorithm for estimating the quantiles of a q-random variable in Section

3, and we use it in Section 4 to design the following quantum sub-Gaussian estimator.

###### Theorem 4.2 (Restated).

There exists a quantum algorithm with the following properties. Let  be a q-random variable with mean and variance , and set as input a time parameter  and a real such that . Then, the algorithm outputs a mean estimate such that, and it performs quantum experiments.

Then we turn our attention to the -approximation problem in Section 5. In case we have an upper bound on the coefficient of variation , we directly use our sub-Gaussian estimator to obtain an algorithm that performs quantum experiments (Corollary 5.1). Next, we consider the more subtle parameter-free setting where there is no prior information about the input random variable, except that it is distributed in . In this case, the number of experiments is chosen adaptively, and the bound we get is stated in expectation.

###### Theorem 5.3 (Restated).

There exists a quantum algorithm with the following properties. Let  be a q-random variable distributed in with mean and variance , and set as input two reals . Then, the algorithm outputs a mean estimate  such that , and it performs quantum experiments in expectation.

Finally, we prove several lower bounds in Section 6 that match the complexity of the above estimators. We also consider the weaker input model where one is given copies of a quantum state encoding the distribution of . We prove that no quantum speedup is achievable in this setting (Theorem 6.6).

### 1.3 Proof overview

#### Sub-Gaussian estimator.

Our approach (Theorem 4.2) combines several ideas used in previous classical and quantum mean estimators. In this section, we simplify the exposition by assuming that the random variable is non-negative and by replacing the variance

with the second moment

. We also take the failure probability to be a small constant. Our starting point is a variant of the truncated mean estimators [Bic65j, BCL13j, LM19j]. Truncation is a process that consists of replacing the samples larger than some threshold value with a smaller number. This has the effect of reducing the tail of the distribution, but also of changing its expectation. Here we study the effect of replacing the values larger than some threshold with , which corresponds to the new random variable . We consider the following classical sub-Gaussian estimator that we were not able to find in the literature: set and compute the empirical mean of  samples from . By a simple calculation, one can prove that the expectation of the removed part is at most . Moreover, using Bernstein’s inequality and the boundedness of , the error between the output estimate and is on the order of . These two facts together imply that the overall error for estimating  is indeed of a sub-Gaussian type. This approach can be carried out in the quantum model by performing the truncation in superposition. This is similar to what is done in previous quantum mean estimators [Hei02j, Mon15j, HM19c]. In order to obtain a quantum speedup, one must balance the truncation level differently by taking . Then, by a clever use of amplitude estimation discovered by Heinrich [Hei02j] (see also [HM18p, Proposition A.1]), the expectation of  can be estimated with an error on the order of . The main drawback of this estimator is that it requires the knowledge of  to perform the truncation. In previous work [Hei02j, Mon15j, HM19c], the authors made further assumptions on the variance to be able to approximate . Here, we overcome this issue by choosing the truncation level differently. Borrowing ideas from classical estimators [LM19j], we define as the quantile value that satisfies . This quantile is always smaller than the previous threshold value . Moreover, it can be shown that the removed part is still on the order of . We give a new quantum algorithm for approximating this quantile with roughly quantum experiments (Theorem 3.4), whereas it would require random experiments classically. Our quantile estimation algorithm builds upon the quantum minimum finding algorithm of Dürr and Høyer [DH96p, vAGGdW20ja] and the th-smallest element finding algorithm of Nayak and Wu [NW99c]. Importantly, it does not require any knowledge about .

#### (ϵ,δ)-Approximation without side information.

We follow an approach similar to that of a classical estimator described in [DKLR00j]. Our algorithm (Theorem 5.3) uses the quantum sub-Gaussian estimator and the quantum sequential Bernoulli estimator described in Proposition 5.2. The latter estimator can estimate the mean  of a random variable distributed in with constant relative error by performing quantum experiments in expectation. The first step of the -approximation algorithm is to compute a rough estimate of with the sequential Bernoulli estimator. Then, the variance of is estimated by using again the sequential Bernoulli estimator on the random variable (where is an independent copy of ). The latter estimation is stopped if it uses more than quantum experiments. We show that if then the computation is not stopped and the resulting estimate is close to with high probability. Otherwise, it is stopped with high probability and we set . Finally, the quantum sub-Gaussian estimator is used with the parameter to obtain a refined estimate of . The choice of the first (resp. second) term in the maximum value implies that with high probability when the variance is larger (resp. smaller) than . In order to upper bound the expected number of experiments performed by this estimator, we show in Proposition 5.2 that the estimates  and  obtained with the sequential Bernoulli estimator satisfy the expectation bounds , and .

#### Lower bounds.

We sketch the proof of optimality of the quantum sub-Gaussian estimator (Theorem 6.2). The lower bound is proved in the stronger quantum query model, which allows us to extend it to all the other models mentioned in Section 2. Our approach is inspired by the truncation level chosen in the algorithm. Given and , we consider the two distributions and that output respectively and with probability , and otherwise. The two distributions have variance  and the distance between their means is larger than . Thus, any estimator that satisfies the bound can distinguish between and with constant success probability. However, we show by a reduction to Quantum Search that it requires at least quantum experiments to distinguish between two distributions that differ with probability at most .

## 2 Model of input

The input to the mean estimation problem is represented by a real-valued random variable  defined on some probability space. A classical estimator accesses this input by obtaining  i.i.d samples of . In this section, we describe the access model for quantum estimators and we compare it to previous models suggested in the literature. We only consider finite probability spaces for finite encoding reasons. First, we recall the definition of a random variable, and we define a classical model of access called a random experiment.

###### Definition 2.1 (Random variable).

A finite random variable is a function for some probability space , where is a finite sample set, is a probability mass function and is the support of . As is customary, we will often omit to mention when referring to the random variable .

###### Definition 2.2 (Random experiment).

Given a random variable on a probability space , we define a random experiment as the process of drawing a sample according to and observing the value of .

We now introduce the concept of “q-random variable” to represent a quantum process that outputs a real number.

###### Definition 2.3 (q-random variable).

A q-variable is a triple where is a finite-dimensional Hilbert space, is a unitary transformation on , and is a projective measurement on indexed by a finite set . Given a random variable  on a probability space , we say that a q-variable generates when,

is a finite-dimensional Hilbert space with some basis indexed by .

is a unitary transformation on such that .

is the projective measurement on defined by . A random variable is a q-random variable if it is generated by some q-variable .

We stress that the sample space may not be known explicitly, and we do not assume that it is easy to perform a measurement in the basis for instance. Often, we are given a unitary  such that for some unknown garbage unit state , together with the measurement . In this case, we can consider the q-random variable  defined on the probability space where and .

We further assume that there exist two quantum oracles, defined below, for obtaining information on the function . These two oracles can be efficiently implemented if we have access to a quantum evaluation oracle for instance. The rotation oracle (Assumption B) has been extensively used in previous quantum mean estimators [Ter99d, BDGT11p, Mon15j, HM19c]. The comparison oracle (Assumption A) is needed in our work to implement the quantile estimation algorithm.

###### Assumption A (Comparison oracle).

Given a q-random variable  on a probability space , and any two values such that , there is a unitary operator acting on such that for all ,

 Ca,b(|ω⟩|0⟩)={|ω⟩|1⟩when a
###### Assumption B (Rotation oracle).

Given a q-random variable  on a probability space , and any two values such that , there is a unitary operator acting on such that for all ,

 Ra,b(|ω⟩|0⟩)=⎧⎪⎨⎪⎩|ω⟩(√1−∣∣X(ω)b∣∣|0⟩+√∣∣X(ω)b∣∣|1⟩)when a

We now define the measure of complexity used to count the number of accesses to a q-random variable, which are referred to as quantum experiments.

###### Definition 2.4 (Quantum experiment).

Let be a q-random variable that satisfies Assumptions A and B. Let be a q-variable that generates . We define a quantum experiment as the process of applying any of the unitaries , , (for any values of ), their inverses or their controlled versions, or performing a measurement according to .

Note that a random experiment (Definition 2.2) can be simulated with two quantum experiments by computing the state and measuring it according to . We briefly mention two other possible input models. First, some authors [Gro98c, NW99c, Hei02j, BHH11j, CFMdW10c, BDGT11p, LW19j] consider the stronger query model where

is the uniform distribution and a quantum evaluation oracle is provided for the function

. A second model tackles the problem of learning from quantum states [BJ99j, AdW18j, ABC20c], where the input consists of several copies of (we do not have access to a unitary preparing that state). We show in Theorem 6.6 that no quantum speedup is achievable for our problem in the latter setting.

## 3 Quantile estimation

In this section, we present a quantum algorithm for estimating the quantiles of a finite random variable . This is a key ingredient for the sub-Gaussian estimator of Section 4

. For the convenience of reading, we define a quantile in the following non-standard way (the cumulative distribution function is replaced with its complement).

###### Definition 3.1 (Quantile).

Given a discrete random variable

and a real , the quantile of order is the number .

Our result is inspired by the minimum finding algorithm of Dürr and Høyer [DH96p] and its generalization in [vAGGdW20ja]. The problem of estimating the quantiles of a set of numbers under the uniform distribution was studied before by Nayak and Wu [NW99c, Nay99d]. We differ from that work by allowing arbitrary distributions, and by not using the amplitude estimation algorithm. On the other hand, we restrict ourselves to finding a constant factor estimate, whereas [NW99c, Nay99d] can achieve any wanted accuracy.

The idea behind our algorithm is rather simple: if we compute a sequence of values where each  is sampled from the distribution of  conditioned on , then when the value of should be close to the quantile . The complexity of sampling each is on the order of classically, but it can be done quadratically faster in the quantum setting. We analyze a slightly different algorithm, where the sequence of samples is strictly increasing and instead of stopping after roughly iterations we count the number of experiments performed by the algorithm and stop when it reaches a value close to . This requires showing that the times spent on sampling is neither too large nor too small with high probability, which is proved in the next lemma.

###### Lemma 3.2.

There is a quantum algorithm such that, given a q-random variable  and a value , it outputs a sample

from the probability distribution of

conditioned on . If we let denote the number of quantum experiments performed by this algorithm, then there exist two universal constants such that and .

###### Proof.

Let be a q-variable generating . We use the comparison oracle from Assumption A to construct the unitary acting on . By definition of and (Section 2), we have that for some unit states where . The algorithm for sampling  conditioned on consists of two steps. First, we use the sequential amplitude amplification algorithm from Theorem A.2 on  to obtain the state . Next, we measure according to . The claimed properties follow directly from Theorem A.2. ∎

We use the next formula for the probability that a value occurs in the sequence defined before. This lemma is adapted from [DH96p, Lemma 1].

###### Lemma 3.3 (Lemma 47 in [vAGGdW20ja]).

Let be a discrete random variable. Consider the increasing sequence of random variables where is a fixed value and  for is a sample drawn from conditioned on . Then, for any ,

 Pr[x∈{Y1,Y2,…}\nonscript|\nonscriptY0=y]={Pr[X=x\nonscript|\nonscriptX≥x]when x>y,0otherwise.

The quantile estimation algorithm is described in Algorithm 1 and its analysis is provided in the next theorem.

###### Theorem 3.4 (Quantile estimation).

Let be a q-random variable. Given two reals , the approximate quantile produced by the quantile estimation algorithm (Algorithm 1) satisfies

 Q(p)≤˜Q≤Q(cp)

with probability at least , where is a universal constant. The algorithm performs quantum experiments.

###### Proof.

Let , be the universal constants mentioned in Lemma 3.2, and set and . Fix and consider the sequence that would be computed during the -th execution of steps 1.a-1.c if the stopping condition on was removed. We prove that immediately after the -th quantum experiment is performed (which may occur during the computation of ), the current value of satisfies with probability at least . The analysis is done in two parts.

First, let and denote by the number of experiments performed until becomes larger than or equal to . According to Lemma 3.3, the probability that a given occurs in the sequence is equal to . Moreover, using Lemma 3.2, the expected number of experiments performed at step 1.b when is at most . Consequently, we have

 E[T−]≤c1∑xx].

Suppose that (otherwise ). We upper bound the above sum by splitting it into several parts as follows. Define for and let be the largest integer such that . For each such that , we have

 ∑Qk−1≤xx] ≤1√Pr[X>Qk−1]+∑Qk−1x]3/2 ≤1√Pr[X≥Qk]+Pr[X>Qk−1]Pr[X≥Qk]3/2 ≤1√2−k+2−(k−1)2−3k/2 ≤2k/2+2.

Similarly, . Thus, where we used that since . By Markov’s inequality, .

Secondly, let and denote by the number of experiments performed at step 1.b to sample when . According to Lemma 3.2, we have . Moreover, by definition of . Thus, .

We conclude that step 1.b is interrupted when the value satisfies with probability at least . Thus, by the Chernoff bound, the output satisfies with probability at least . The total number of experiments is guaranteed to be by our use of the counter . ∎

## 4 Sub-Gaussian estimator

In this section, we present the main quantum algorithm for estimating the mean of a random variable with a near-quadratic speedup over the classical sub-Gaussian estimators. Our result uses the following Bernoulli estimator, which is a well-known adaptation of the amplitude estimation algorithm to the mean estimation problem [BHMT02j, Ter99d, Mon15j]. The Bernoulli estimator allows us to estimate the mean of the truncated random variable for any .

###### Proposition 4.1 (Bernoulli estimator).

There exists a quantum algorithm, called the Bernoulli estimator, with the following properties. Let be a q-random variable and set as input a time parameter , two range values , and a real such that . Then, the Bernoulli estimator outputs a mean estimate of such that . It performs  quantum experiments.

###### Proof.

Let be a q-variable generating . Using the rotation oracle from Assumption B, we define the unitary algorithm acting on . In order to simplify notations, let us first assume that the random variable is distributed in the interval . Then, and by definition of and (Section 2) we have,

 V|0⟩ =∑ω∈Ω√p(ω)|ω⟩⎛⎝√1−X(ω)b|0⟩+√X(ω)b|1⟩⎞⎠ =√1−μb(∑ω∈Ω√p(ω)(b−X(ω))b−μ|ω⟩)|0⟩+√μb(∑ω∈Ω√p(ω)X(ω)μ|ω⟩)|1⟩.

Thus, there exist some unit states such that and . If takes values outside the interval then the same result holds with in place of and a different definition of .

Consider the output of the amplitude estimation algorithm (Theorem A.3) where . Then, the estimate satisfies the statement of the proposition with probability by Theorem A.3. The Bernoulli estimator consists of running copies of and outputting the median of the results. The success probability is at least by the Chernoff bound. ∎

The Bernoulli estimator can estimate the mean of a non-negative q-random variable  by setting and . However, its performance is worse than that of the classical sub-Gaussian estimators when the maximum of is large compared to its variance. Our quantum sub-Gaussian estimator (Algorithm 2) uses the Bernoulli estimator in a more subtle way, and in combination with the quantile estimation algorithm.

###### Theorem 4.2 (Sub-Gaussian estimator).

Let be a q-random variable with mean  and variance . Given a time parameter and a real such that , the sub-Gaussian estimator (Algorithm 2) outputs a mean estimate  such that,

 Pr[|˜μ−μ|≤σlog(1/δ)n]≥1−δ.

The algorithm performs quantum experiments.

###### Proof.

First, by standard concentration inequalities, the median computed at step 2 satisfies with probability at least . Moreover, if then , by using the triangle inequality. Below we prove that for any non-negative random variable the estimate of computed at step 3 satisfies

 |˜μY−μY|≤√E[Y2]log(1/δ)5n (4)

with probability at least . Using the fact that and , we can conclude that

with probability at least . The algorithm performs classical experiments during step 2, quantum experiments during step 3.a, and quantum experiments during step 3.b.

We now turn to the proof of Equation (4). We make the assumption that all the subroutines used in step 3 are successful, which is the case with probability at least . First, according to Theorem 3.4, we have for some universal constant . It implies that , where the first two inequalities are by definition of the quantile function , and the last inequality is a standard fact. Consequently, by our choice of ,

 ˜Q≤6n√E[Y2]√clog(1/δ). (5)

Next, we upper bound the expectation of the part of that is above the largest threshold considered in step 3.b. By Cauchy–Schwarz’ inequality, we have . Moreover, by definition of , . Thus,

 E[Y1Y>˜Q]≤√E[Y2]log(1/δ)6n. (6)

The expectation of is decomposed into the sum , where is estimated at step 3.b. We have for all according to Proposition 4.1. Thus, by the triangle inequality,

 |˜μY−μY| ≤k∑ℓ=0|˜μℓ−μℓ|+E[Y1Y>ak] ≤k∑ℓ=0√aℓμℓlog(1δ)dn√logn+k∑ℓ=0aℓlog(1δ)2d2n2logn+E[Y1Y>ak] ≤˜Qlog(1δ)dn2√logn+k∑ℓ=1√2E[Y21aℓ−1ak] ≤√2k√∑kℓ=1E[Y21aℓ−1ak] ≤√2k√E[Y2]log(1δ)dn√logn+3˜Qlog(1δ)2dn2√logn+E[Y1Y>ak] ≤√2√E[Y2]log(1δ)dn+18√E[Y2]log(1δ)√cdn√logn+√E[Y2]log(1δ)6n

where the third step uses and when , the fourth step uses the Cauchy–Schwarz inequality, the sixth step uses Equations (5) and (6), and in the last step we choose . ∎

## 5 (ϵ,δ)-Estimators

We study the -approximation problem under two different scenarios. First, we consider the case where we know an upper bound on the coefficient of variation . As a direct consequence of Theorem 4.2 we obtain the following estimator that subsumes a similar result shown in [HM19c] for non-negative random variables.

###### Corollary 5.1 (Relative estimator).

There exists a quantum algorithm with the following properties. Let be a q-random variable with mean and variance , and set as input a value and two reals . Then, the algorithm outputs a mean estimate such that