1 Introduction
We consider how to estimate the mixedness or noisiness of a quantum state using measurements of independent copies of the state. Mixed quantum states can arise in practice in various ways. Classical stochasticity can be intentionally introduced when the state is originally prepared. Pure states can become mixed by a quantum measurement. And the states of the subsystems of bipartite states can be mixed even when the overall bipartite state is pure, which forms the basis for purification.
In the third case, the level of mixedness of the subsystems indicates the level of entanglement in the pure, bipartite system. The possibility of entanglement of two separated systems is arguably the most curious, and the most powerful, way in which quantum systems differ from classical ones. Indeed, entanglement has been fruitfully exploited as a resource in a number of quantum information processing protocols [BW92, BBC93, BSST02, DHW04, HW10]. The subsystems of a pure bipartite state are pure if and only if the bipartite state itself is unentangled, and likewise they are maximally mixed if and only if the bipartite state is maximally entangled. Thus the mixedness of the subsystems’ states can be used as a measure of entanglement of the bipartite system.
Mixedness can be measured in multiple ways. We shall use the von Neumann and (the family of) Rényi entropies, which correspond to the classical Shannon and (the family of) Rényi entropies of the eigenvalues of the density operator of the state, respectively. A density matrix (or operator)
is a complex positive semidefinite matrix with unit trace; thus its eigenvalues are nonnegative and sum to one. The von Neumann entropy of a density matrix isFor , the Rényi entropy of order of is
In the limit of ,
The classical Shannon and Rényi entropies are wellaccepted measures of randomness, and can be derived axiomatically [CK81, pp. 2527]. Both the classical and quantum versions can be justified operationally as a measure of compressibility [CK81, Sch95, JS94, Lo95]. The quantum versions have been explicitly proposed for quantifying entanglement [Car12].
In principle, both the von Neumann and Rényi entropies for a quantum state can be computed if the state is known. We consider how to estimate these quantities for an unknown state given independent copies of the state, to which arbitrary quantum measurements followed by arbitrary classical computation can be applied. This problem arises when characterizing a completely unknown system and when one seeks to experimentally verify that a system is behaving as desired. Since generating independent copies of a state can be quite costly in the quantum setting [HHR05, MHS12], it is desirable to minimize the number of independent copies of the state that are required to estimate the von Neumann and Rényi entropies to a desired precision and confidence. We thus adopt this copy complexity as our figureofmerit.
Using standard results in quantum state estimation, we reduce our problem to one that is fully classical. We first describe this fullyclassical problem, which is potentially of interest in its own right.
1.1 QuantumFree Formulation
Let be a distribution over . A property is a mapping of distributions to real numbers. A property is said to be symmetric (or labelinvariant) if it is a function of only the multiset of probability values, and not the ordering. For example, the Shannon entropy is symmetric, since it is only a function of the probability values.
Classical symmetric property estimation.
We are given independent samples from an unknown distribution , and the goal is to estimate a symmetric property up to a factor, with probability at least 2/3.
Quantum state property estimation.
The problem of estimating von Neumann and Rényi entropies of a quantum state can be shown to be equivalent to estimating a symmetric property of some distribution. However, instead of being given independent samples from the distribution as in the classical case, we are given access to a function of . Here are integers satisfying the following property.

For any , is equal to the largest possible sum of the lengths of disjoint nondecreasing subsequences of .
Equivalently, we may view the observations as the output of the Robinson–Schensted–Knuth (RSK) algorithm applied to the sequence , instead of being itself. The reader is referred to [OW17b] for more details on the procedure. The copy complexity of estimating quantum entropy turns out to be equivalent to the problem of estimating classical entropy when given access to . A simple data processing of the form shows that the complexity of estimating a quantum state property is at least as hard as estimating the same property in the classical setting.
1.2 Organization
The paper is organized as follows. In Section 1.3, we state our results, followed by a brief description of our tools in Section 1.4, and related work in Section 1.5. Section 2 gives a summary of the quantum setup. Section 3 provides the preliminary results needed for setting up the paper. In particular, Section 3.1.1 describes the optimal quantum measurement for the class of properties we are interested in. Section 4 proves our bounds for integral order Rényi entropy. Section 6 proves the upper bounds for nonintegral orders, and Section 7 shows the lower bounds on the performance of the empirical estimator.
1.3 Our Results
We consider the following problem.
: Given a property , and access to independent copies of a dimensional mixed state (e.g. output of some quantum experiment), how many copies are needed to estimate to within ?^{2}^{2}2We seek success with probability at least , which can be boosted to by repeating the algorithm times and taking the median.
We study the copy complexity of estimating the entropy of a mixed state of dimension . The copy complexity, denoted by , is the minimum number of copies required for an algorithm that solves . Copy complexity is defined precisely in Section 2.2.
We will use the standard asymptotic notations. We will be interested in characterizing the dependence of , and , as a function of and . We assume the parameter to be a constant, and focus on only the growth rate as a function of and .
We will now discuss our results, which are summarized in Table 2 and Table 2. For comparison purposes, it is useful to recall the copy complexity of quantum tomography, in which the goal is to learn the entire density matrix . The problem has been studied in various works using various distance measures; and up to polylogarithmic factors, for the standard distance measures, the copy complexity depends quadratically on the dimension . Namely, it is .^{3}^{3}3We discuss the copy complexity of some other problems in related work (Section 1.5). Similar to the sample complexity of estimating Rényi entropies of classical distributions from samples, our bounds are also dependent on whether is less than one, and whether it is an integer. (See Table I of [AOST17], and Section 1.5.1 for the sample complexity in classical settings.) We organize our results as a function of as follows.
Integral .
We obtain our most optimistic and conclusive results in this case. In Theorem 1, we show that . We note that the lower bounds here hold for all estimators, not just of the estimators used in the upper bound. Furthermore, these bounds are subquadratic in
, namely we can estimate the Rényi entropy of integral orders even before we have enough copies to perform full tomography. The upper bounds are established by analyzing certain polynomials from representation theory that are related to the central characters of the symmetric group. The main contribution is to analyze the variance of these estimators, for which we draw upon various results from Kerov’s algebra. For the lower bound, we design the spectrums of two mixed states such that their Rényi entropy differ by at least
, but require a large copy complexity to distinguish between them. We use various properties of Schur polynomials and other properties of integer partitions [Mac98, HR18].Remark 1.
The first term in the complexity dominates when , and is identical to the sample complexity of estimating Rényi entropy in the classical setting.
.
We analyze the Empirical Young Diagram (EYD) algorithm [ARS88, KW01] for estimating for . The EYD algorithm is similar to using a plugin estimate of the empirical distribution to estimate properties in classical distribution property estimation. We show that . Since , this growth is faster than quadratic, namely the EYD algorithm requires more copies than is required for tomography. We complement this result by showing that in fact the EYD algorithm requires copies, showing that the superquadratic dependence on is necessary for the EYD algorithm. The upper bound is proved in Theorem 4, and the lower bound in Theorem 8. In comparison, in the classical setting the exponent of is almost .
von Neumann entropy, .
Again using the EYD algorithm, in Theorem 2 we show that . We formulate an optimization problem whose solutions are an upper bound on the bias of the empirical estimate, and we bound the variance by proving that the estimator has a small bounded difference constant. In Theorem 7 we show a lower bound of for the EYD estimator to estimate the entropy of the maximally mixed state. This complexity is still similar to that of full quantum tomography.
Non integral .
Again using the EYD algorithm, in Theorem 3, we show that . We also provide a lower bound of for the EYD estimator in Theorem 7.
In addition to these results, we improve the error probability of the lower bounds on the convergence of EYD algorithm to the true spectrum. In particular, for the uniform distribution
[OW15] have shown that unless the number of copies is at least the EYD has a total variation distance of at least with probability at least 0.01. We show that in fact unless the number of copies is at least the trace distance is at least with probability at least for some constant .1.4 Our Techniques
In this section, we provide a high level overview of the technical contributions of our paper.
The entropy functions that we consider are unitarily invariant properties (Section 2.3), namely they depend only on the multiset of eigenvalues of the density matrix. For example, a density matrix with eigenvalyes , we have , and , meaning that von Neumann, and Rényi entropy are unitarily invariant properties. For such properties, it is known that an optimal measurement scheme over the set of all measurements is the weak Schur Sampling (WSS) (Section 3.1.1). The output of this measurement is a partition of , usually denoted by a Young diagram (Section 3.1), the number of independent copies of used. The goal is then to estimate the entropy from the output Young diagram supplied by WSS.
Estimating Rényi entropy is equivalent to obtaining multiplicative estimates of the power sum . In the classical setting, it turns out that for integral
, there are simple unbiased estimators of
. In the quantum setting, for integral , there are unbiased estimators for . These estimators are now polynomials (called ) over Young tableaus obtained from Kerov’s algebra. While the estimator itself is simple to state in terms of polynomials, bounding its variance requires a number of intricate arguments. Using results from representation theory about , we first write the variance of the estimator as a linear combination of polynomials. We use combinatorial arguments about the cycle structure of compositions of permutations, and use that to show that only a certain subset of ’s can appear in the variance expression. Moreover, the number of ’s can be bounded using the HardyRamanujam bounds on the partition numbers. We also provide bounds on the coefficients to finally obtain the upper bound for integral (Theorem 1).For the lower bound for integral , one of the terms follows from the classical lower bounds, and the fact that estimation is easier in the classical setting than in the quantum setting. To prove a lower bound equal to the second term, we invoke the classical Le Cam’s method combined with results on Schur polynomials and partition numbers.
Our upper bounds for von Neumann entropy and for nonintegral use the Empirical Young Diagram (EYD) algorithm (Section 3.1.2). This is akin to the empirical plugin estimators for distribution property estimation. Our upper bounds require various bias and concentration results on the Youngtableaux. Fortunately, in the recent works of O’Donnell and Wright, a number of such bounds were proved. We build upon their results, and prove some additional results to show the copy complexity bounds for the EYD algorithm.
To prove the lower bounds for the EYD algorithm, we design eigenvalues such that unless the number of copies is large enough, the EYD algorithm cannot concentrate around the true entropy.
One of our contributions pertains to the convergence of the empirical Young diagram to the true distribution. A lower bound of was shown by [OW15]. However, their results only holds with a constant probability (with probability 0.01 to be precise). We show very sharp concentration by invoking McDiarmid’s inequality. We show that unless the number of samples is more than the empirical Young diagram’s lower bound holds with probability for some constant . This exponential concentration result could be of independent interest.
1.5 Related Work
Our work is related to symmetric distribution property estimation in classical setting, property estimation of classical distributions using quantum queries, and the property estimation of quantum states (as in the setup of this paper). We briefly mention some closely related works. The reader is encouraged to read the survey by Montanaro and de Wolf [MdW13], and the thesis by Wright [Wri16] for more details on the recent literature.
1.5.1 Symmetric Property Estimation of Discrete Distributions
A property of a distribution is symmetric if it is a function of only the probability multiset. A number of properties, such as the Shannon entropy, Rényi entropy, KL divergence, support size, distance to uniformity, are all symmetric. While there is a long literature on some of these problems, the optimal sample complexity for these problems was established only over the last decade [VV11, WY16, JVHW15, AOST17, JHW16, WY15, OSVZ04, BZLV16, HJW14, ADOS17]. We mention the state of the art results, and the reader can consult the related papers and references therein to learn more about the landscape of symmetric distribution property estimation problems. Similar to the quantum setting, let be the minimum number of samples needed from a discrete distribution over elements to estimate a property up to , again with probability at least .
The problem of estimating Rényi entropy , was studied in [AOST15, AOST17, OS17]. The sample complexity dependence in the classical setting seems to suggest the same qualitative behavior as our results. They show that for , , and for , . Moreover, their information theoretic lower bounds show that the exponent of cannot be improved by any algorithm. For the case of integral , larger than one, they characterize the sample complexity up to constant factors by showing that . We note that this complexity is indeed one of the terms in our copy complexity for integral , which happens for large . [OS17] provide bounds that improve the sample complexity of Rényi entropy estimation, for distributions with small Rényi entropy.
1.5.2 Quantum Property Estimation of Mixed States
While we are not aware of a lot of literature on property estimation of mixed states, there are now many works on the related problem of quantum property testing, where the goal is to find the copy complexity of deciding whether a mixed state has a certain property of interest, and on the problem of quantum tomography, where the goal is to learn the entire density matrix .
The copy complexity of quantum tomography is quadratic in , and the complexity for tomography in various distance measures have been studied in [HHJ17, OW16, OW17a].
Testing whether has a particular unitarily invariant property of interest was studied in [OW15] for a number of properties. They show that for testing whether is maximally mixed, namely whether all elements of are , requires copies. They also studied the problem of testing the rank of , and also provide bounds on the performance of the EYD algorithm for estimating the spectrum. Recently, [BOW17] obtained tight bounds on the copy complexity of testing whether an unknown density matrix is equal to a known density matrix. The optimal measurement schemes for some of these problems can be computationally expensive. Testing properties under simpler local measurements was studied recently in [PML17].
In a personal communication, Bavarian, Mehraban, and Wright [BMW16] claim an algorithm with copy complexity for the von Neumann entropy estimation, which is an factor improvement over our bound.
1.5.3 Quantum Algorithms for Classical Distribution Properties
Testing and estimating distribution properties using quantum queries has been considered by various authors. Problems of testing properties such as uniformity, identity, closeness under the regular quantum query model, and conditional quantum query models have been studied in [BHH11, CFMDW10, SSJ17].
Recently Li and Wu [LW17] studied the quantum query complexity of estimating entropy of discrete distributions. They provide bounds on the query complexity for estimating von Neumann entropy, and Rényi entropy. For certain values of , the bounds on query complexity can in fact be at times quadratically better than the corresponding sample complexity bounds.
2 Quantum Measurements and Property Estimation
2.1 Density Matrix and Quantum Measurement
A quantum state is described by a density matrix , which is a dimensional positive semidefinite matrix with unit trace. The joint state of
independent copies is given by the tensor product
, which is a density matrix of dimension .Quantum measurements are described by a set of matrices called measurement operators, where index denotes the measurement outcome. Measurement operators satisfy the completeness condition, . If the premeasurement state is then probability of measurement outcome is , and the postmeasurement state is . The measurement operators are also allowed to have an infinite outcome set, in which case a suitable algebra on the set of outcomes and a probability measure on this space are defined. For a detailed discussion of these concepts see [NC10].
2.2 Property Estimation
A property maps a mixed state to . Given and , an estimator is a set of measurement matrices for the state space and a “classical processor” , which maps the natural numbers to . Given copies of a state , the estimator proceeds by applying the measurement to the state and then applying to the resulting outcome. Given a property , accuracy parameter , error parameter , and access to independent copies of a mixed state , we seek an estimator such that with probability at least
The copy complexity of is
the minimum number of copies required to solve the problem. Throughout this paper we will consider to be a constant, say 1/3. We can boost the error to any by repeating the estimation task times, and taking the median of the outcomes. This causes an additional multiplicative cost in the copy complexity. We denote
(1) 
2.3 Unitarily Invariant Properties
Suppose is the set of all unitary matrices.
Definition 1.
A property is called unitarily invariant, if for all .
Let be the multiset the eigenvalues (also called as spectrum) of . Two density matrices , and
have the same spectrum if and only if there is a unitary matrix
such that . Therefore, unitarily invariant properties are functions of only the spectrum of the density matrix. Since density matrices are positive semidefinite with unit trace, then , and we can view as a distribution over some set. Unitarily invariant properties are analogous to properties in classical distributions that are a function of only the multiset of probability elements, called symmetric properties.For a density matrix with eigenvalues , we have , and . Quantum entropy can be viewed as the classical entropy of the distributions defined by , and in particular they are unitarily invariant.
Working with unitarily invariant properties is greatly simplified by the following powerful result [KW01, CHW07, Har05, Chr06] (See [MdW13, Section 4.2.2] for details).
Lemma 1.
A quantum measurement called weak Schur sampling is optimal for estimating unitarily invariant properties.
Weak Schur sampling is discussed in Section 3.1.1.
3 Preliminaries
We list some of the definitions and results we use in the paper.
Definition 2.
The total variation distance, KL divergence, and distance between distributions , and over are
(2)  
(3)  
(4) 
The distance measures satisfy the following bound.
Lemma 2.
The first inequality is Pinsker’s Inequality, and the second follows from concavity of logarithms.
We now state some concentration results that we use.
Let be a function, such that
(5) 
for some .
The next two results show concentration results of functions that satisfy (5). The following lemma is [BLM13, Corollary 3.2].
Lemma 3.
For independent variables ,
The next result is McDiarmid’s inequality [BLM13, Theorem 6.2].
Lemma 4.
For independent variables ,
3.1 Schur Polynomials and PowerSum Polynomials
A partition of is a collection of nonnegative integers that sum to . We write and we write for the set of all partitions of . We denote the number of positive integers in by , which we call its length. An partition can be depicted with an English Young diagram, which consists of a row of boxes above a row of boxes, etc., as showed in Fig. 1. The partition associated with a Young diagram is called its shape. Note that the number of rows in the Young diagram of is and the total number of boxes is . A Young tableau over alphabet is a Young diagram in which each box has been filled with an element of . A Young tableau is called standard if it is strictly increasing lefttoright across each row and toptobottom down each column. A Young tableau is semistandard if it is strictly increasing toptobottom down each column and nondecreasing lefttoright across each row. Given and , the Schur polynomial is the polynomial in the variables defined by
(6) 
where the sum is over the set of all semistandard Young Tableaus over alphabet corresponding to the partition and is the number of times appears in . Schur polynomials turn out to be symmetric, meaning that they are invariant to the ordering of the variables [Mac98, Sta99].
We shall also consider polynomials obtained from power sums. Given and a distribution on ,^{4}^{4}4
Power sums can are usually defined for general vectors. We will consider them only for distributions in this paper.
defineGiven , we define the power sum polynomial by
The following is Lemma 1 in [AOST17], which describes a number of inequalities that hold for the power sums of distributions.
Lemma 5.
Suppose is a distribution over elements, then

[label=()]

For ,
and for ,

For every and ,

For every ,

For and ,

For and ,
and
Schur polynomials and powersum polynomials are related through a change of basis. There exists a function such that [Sta99, Theorem 7.17.3]
(7) 
The function in fact comprises the characters of the irreducible representations of the symmetric group on [Sta99, Sec. 7.18], although this fact is not needed. The function can also be defined combinatorially [Sta99]. The quantity is difficult to compute in general [Hep94], although we shall only be interested in particular , as follows. Let denote the number of standard Young tableaus over alphabet with shape . For and define
where is the falling power, i.e., and denotes the partition of consisting of followed by ones.
3.1.1 Weak Schur Sampling (WSS)
We describe some of the key results about weak Schur sampling (WSS) that we will use in this paper. The readers can refer to [MdW13, Section 4.2.2], [Wri16, Chapter 3], and references therein for further details.
Weak Schur Sampling is a measurement that takes independent copies of a mixed state (denoted ), and outputs a . The output distribution over partitions is called SchurWeyl distribution, denoted , and the probability of is given by
(8) 
where, recall from the previous section that is the number of Standard Young Tableaux of shape , and is the Schur polynomial with variables , and shape . Since Schur polynomials are symmetric, this probability is only a function of the multiset of eigenvalues, namely a function of the eigenvalue spectrum.
An alternate combinatorial characterization of the output of WSS is given next. Some of the intermediate steps involving the RobinsonSchenstedKnuth (RSK) correspondences, and Green’s theorem are not invoked later in the paper, and are omitted. We simply describe the method by which the final diagram is obtained. The reader can refer to the short survey [OW17b] for details on the combinatorial procedure.
Suppose is a mixed state with the multiset of eigenvalues .

Consider a distribution over , where has probability .

Draw independently from this distribution.

Let , be such that for any , is equal to the largest sum of lengths of disjoint nondecreasing subsequences of .
The output distribution of this process is the same as that of weak Schur Sampling [Wri16]. Furthermore, one of the results proved in [Wri16] is that the output distribution of the procedure above is independent of the ordering of ’s, and only depends on the multiset of the eigenvalues. For example, when , the distributions , and the distribution have the same output distributions over Young tableaux generated by the procedure above.
Since the Young tableaux is a function of the sequence generated by the spectrum distribution,
Lemma 6.
The copy complexity of estimating a unitarily invariant property of a mixed state is at least the sample complexity of estimating the same symmetric property of the spectrum distribution.
The polynomial defined in the last section is useful to us due to the following lemma, which states that the (normalized) polynomial is an unbiased estimator of the
th moment of
. The lemma follows from the definitions and results already mentioned, and is implicit in [Mél10, IO02], and explicit in [Wri16, Proposition 3.8.3].Lemma 7.
Fix a distribution , a natural number , and any partition of . If is randomly generated according to the distribution in (8) then
(9) 
In the special case when , a partition with only one part, we have
(10) 
3.1.2 The EYD algorithm, and classical plugin estimation
The EYD algorithm is a simple algorithm for estimating . The algorithm works in two steps.

Compute the empirical distribution, which assigns probability to the symbol .

Output the property of a mixed state with eigenvalues equal to .
The EYD algorithm is a quantum analogue of the classical empirical/plugin estimator, which works as follows. Consider the step 2 of the weak Schur sampling procedure explained in Section 3.1.1, which generates , i.i.d. samples from the distribution over . Let be the empirical distribution of , which assigns a probability to a symbol , where is the number of times symbol appears in . The plugin estimator, upon observing , outputs . The plugin estimator has been widely studied in statistics literature.
An observation from the nondecreasing subsequence interpretation of the weakSchur sampling is that for any sequence , the distribution majorizes the corresponding empirical distribution. This follows from the fact that the length of longest disjoint nondecreasing subsequences is always at least the sum of the largest ’s. In particular, we can state the following result.
Lemma 8.
Consider the sorted plugin distribution of , and the distribution obtained from by the WSS procedure. majorizes , namely, for all , .
3.2 Proving Upper Bounds on Copy Complexity
Consider and . Suppose satisfies
Then
(11) 
Therefore, to obtain a estimate of , it suffices to derive a multiplicative estimate of . Note that since for . Moreover, in the regime in which does not grow with , . Therefore, in the remainder of the paper, we will be interested in multiplicative estimators.
Finally note that for any ,
by Markov’s inequality. Since (by Lemma 7), then we get an multiplicative estimator of with probability at least 8/9 if
(12) 
4 Measuring for integral
Our main result for integral is the following tight bound (up to constant factors) on the copy complexity of estimating .
Theorem 1.
For ,
where the hidden constants depend only on .
4.1 Achievability
Our Renyi entropy estimator is simple, and is described in Algorithm 1.
Note that we could have simply removed the terms from the algorithm’s description, but these polynomials have a number of applications in representation theory to study the Symmetric group, and we simply keep the notation and definitions intact.
To prove the theorem, we bound the expectation and concentration of .
Lemma 9.
There is a constant depending only on such that
(13)  
(14) 
4.1.1 Proof of Theorem 1 using Lemma 9
4.1.2 Proof of Lemma 9
Equation (13) has already been established (cf. Lemma 7). It remains to bound the variance of the estimator.
The second term is evaluated from the means of the polynomials, which we know. For the first term, we need to bound the expectation of the products of such polynomials. In fact, there is a general result [IO02, Proposition 4.5][Wri16, Corollary 3.8.8] that states that for any ,
In our case, both the partitions are . So we can write
where is at most , and is the set of all partitions that can be obtained through the following procedure:

Let be an integer in the set .

Let be a permutation over that has a cycle over the elements , and all the remaining elements are fixed points (the set for ).

Let be a permutation over that has a cycle over the elements , and all the remaining elements are fixed points (the set for ).

Let be the cycle structure of .
The set of partitions that can be obtained through the above procedure for a fixed will be denoted by . Now consider,
where we have used that . To bound for , we use the following two lemmas. Lemma 10 is proved in Appendix A and Lemma 11 is proved in Appendix B. Recall that for a partition , denotes the length of the partition.
Lemma 10.
For all and , .
Definition 3.
Let and be partitions of the same integer . Then is said to majorize , denoted , if for all ,
Lemma 11.
Let . Then for any distribution , .
Noting that , we obtain