1 Introduction
Symmetric property estimation is a fundamental and well studied problem in machine learning and statistics. In this problem, we are given
i.i.d samples from an unknown distribution^{1}^{1}1Throughout the paper, distribution refers to discrete distribution. p and asked to estimate , where f is a symmetric property (i.e. it does not depend on the labels of the symbols). Over the past few years, the computational and sample complexities for estimating many symmetric properties have been extensively studied. Estimators with optimal sample complexities have been obtained for several properties including entropy [VV11, WY16, JVH+15], distance to uniformity [VV11, JHW16], and support [VV11, WY15].All aforementioned estimators were property specific and therefore, a natural question is to design a universal estimator. In [ADO+16], the authors showed that the distribution that maximizes the profile likelihood, i.e. the likelihood of the multiset of frequencies of elements in the sample, referred to as profile maximum likelihood (PML) distribution, can be used as a universal plugin estimator. [ADO+16] showed that computing the symmetric property on the PML distribution is sample complexity optimal in estimating support, support coverage, entropy and distance to uniformity within accuracy . Further, this also holds for distributions that approximately optimize the PML objective, where the approximation factor affects the desired accuracy.
Acharya et al. [ADO+16] posed two important and natural open questions. The first was to give an efficient algorithm for finding an approximate PML distribution, which was recently resolved in [CSS19]. The second open question is whether PML is sample competitive in all regimes of the accuracy parameter ? In this work, we make progress towards resolving this open question.
First, we show that the PML distribution based plugin estimator achieves optimal sample complexity for all for the problem of estimating support size. Next, we introduce a variation of the PML distribution that we call the pseudo PML distribution. Using this, we give a general framework for estimating a symmetric property. For entropy and distance to uniformity, this pseudo PML based framework achieves optimal sample complexity for a broader regime of the accuracy parameter than was known for the vanilla PML distribution.
We provide a general framework that could, in principle be applied to estimate any separable symmetric property f, meaning can be written in the form of . This motivation behind this framework is that for any symmetric property f that is separable, the estimate for can be split into two parts: , where and are a (property dependent) disjoint partition of the domain . We refer to as the good set and as the bad set. Intuitively, is the subset of domain elements whose contribution to is easy to estimate, i.e a simple estimator such as empirical estimate (with correction bias) works. For many symmetric properties, finding an appropriate partition of the domain is often easy. Many estimators in the literature [JVH+15, JHW16, WY16] make such a distinction between domain elements. The more interesting and difficult case is estimating the contribution of the bad set: . Much of the work in these estimators is dedicated towards estimating this contribution using sophisticated techniques such as polynomial approximation. Our work gives a unified approach to estimating the contribution of the bad set. We propose a PML based estimator for estimating . We show that computing the PML distribution only on the set is sample competitive for entropy and distance to uniformity for almost all interesting parameter regimes thus (partially) handling the open problem proposed in [ADO+16]. Additionally, requiring that the PML distribution be computed on a subset reduces the input size for the PML subroutine and results in practical algorithms (See Section 6).
To summarize, the main contributions of our work are:

We make progress on an open problem of [ADO+16] on broadening the range of error parameter that one can obtain for universal symmetric property estimation via PML.

We give a general framework for applying PML to new symmetric properties.

As a byproduct of our framework, we obtain more practical algorithms that invoke PML on smaller inputs (See Section 6).
1.1 Related Work
For many natural properties, there has been extensive work on designing efficient estimators both with respect to computational time and sample complexity [HJW+17, HJM17, AOS+14, RVZ17, ZVV+16, WY16, RRS+07, WY15, OSW16, VV11, WY16, JVH+15, JHW16, VV11]. We define and state the optimal sample complexity for estimating support, entropy and distance to uniformity. For entropy, we also discuss the regime in which the empirical distribution is sample optimal.
Entropy: For any distribution , the entropy . For (the interesting regime), where , the optimal sample complexity for estimating within additive accuracy is [WY16]. Further if , then [WY16] showed that empirical distribution is optimal.
Distance to uniformity: For any distribution , the distance to uniformity , where
is the uniform distribution over
. The optimal sample complexity for estimating within additive accuracy is [VV11, JHW16].Support: For any distribution , the support of distribution
. Estimating support is difficult in general because we need sufficiently large number of samples to observe elements with small probability values. Suppose for all
, if , then [WY15] showed that the optimal sample complexity for estimating support within additive accuracy is .PML was introduced by Orlitsky et al. [OSS+04] in 2004. The connection between PML and universal estimators was first studied in [ADO+16]. As discussed in the introduction, PML based plugin estimator applies to a restricted regime of error parameter . There have been several other approaches for designing universal estimators for symmetric properties. Valiant and Valiant [VV11]
adopted and rigorously analyzed a linear programming based approach for universal estimators proposed by
[ET76] and showed that it is sample complexity optimal in the constant error regime for estimating certain symmetric properties (namely, entropy and support size). Recent work of Han et al. [HJW18]applied a local moment matching based approach in designing efficient universal symmetric property estimators for a single distribution.
[HJW18] achieves the optimal sample complexity in restricted error regimes for estimating the power sum function, support and entropy.Recently, [YOS+18] gave a different unified approach to property estimation. They devised an estimator that uses samples and achieves the performance attained by the empirical estimator with samples for a wide class of properties and for all underlying distributions. This result is further strengthened to samples for Shannon entropy and a broad class of other properties including distance in [HO19a].
Independently of our work, authors in [HO19b] propose truncated PML that is slightly different but similar in the spirit to our idea of pseudo PML. They use the approach of truncated PML and study its application to symmetric properties such as: entropy, support and coverage; refer [HO19b] for further details.
1.2 Organization of the Paper
In Section 2 we provide basic notation and definitions. We present our general framework in Section 3 and state all our main results. In Section 4, we provide proofs of the main results of our general framework. In Section 5, we use these results to establish the sample complexity of our estimator in the case of entropy (See Section 5.1) and distance to uniformity (See Section 5.2). Due to space constraints, many proofs are deferred to the appendix. In Section 6, we provide experimental results for estimating entropy using pseudo PML and other stateoftheart estimators. Here we also demonstrate the practicality of our approach.
2 Preliminaries
Let denote all integers in the interval . Let be the set of all distributions supported on domain and let be the size of the domain. Throughout this paper we restrict our attention to discrete distributions and assume that we receive a sequence of independent samples from an underlying distribution . Let be the set of all length sequences and be one such sequence with denoting its th element. The probability of observing sequence is:
where is the frequency/multiplicity of symbol in sequence and is the probability of domain element . We next formally define profile, PML distribution and approximate PML distribution.
Definition 2.1 (Profile).
For a sequence , its profile denoted is where is the number of domain elements with frequency in . We call the length of profile and use denote the set of all profiles of length . ^{2}^{2}2The profile does not contain , the number of unseen domain elements.
For any distribution , the probability of a profile is defined as:
(1) 
The distribution that maximizes the probability of a profile is the profile maximum likelihood distribution and we formally define it next.
Definition 2.2 (Profile maximum likelihood distribution).
For any profile , a Profile Maximum Likelihood (PML) distribution is: and is the maximum PML objective value. Further, a distribution is a approximate PML distribution if .
We next provide formal definitions for separable symmetric property and an estimator.
Definition 2.3 (Separable Symmetric Property).
A symmetric property is separable if for any , , for some function . Further for any subset , we define .
Definition 2.4.
A property estimator is a function , that takes as input samples and returns the estimated property value. The sample complexity of for estimating a symmetric property is the number of samples needed to estimate f up to accuracy and with constant probability. The optimal sample complexity of a property f is the minimum number of samples of any estimator.
3 Main Results
As discussed in the introduction, one of our motivations was to provide a better analysis for the PML distribution based plugin estimator. In this direction, we first show that the PML distribution is sample complexity optimal in estimating support in all parameter regimes. Estimating support is difficult in general and all previous works make the assumption that the minimum nonzero probability value of the distribution is at least . In our next result, we show that the PML distribution under this constraint is sample complexity optimal for estimating support.
Theorem 3.1.
The PML distribution ^{3}^{3}3 Under the constraint that its minimum nonzero probability value is at least . This assumption is also necessary for the results in [ADO+16] to hold. based plugin estimator is sample complexity optimal in estimating support for all regimes of error parameter .
For support, we show that an approximate PML distribution is sample complexity optimal as well.
Theorem 3.2.
For any constant , an approximate PML distribution based plugin estimator is sample complexity optimal in estimating support for all regimes of error .
We defer the proof of both these theorems to Appendix A.
For entropy and distance to uniformity, we study a variation of the PML distribution we call the pseudo PML distribution and present a general framework for symmetric property estimation based on this. We show that this pseudo PML based general approach gives an estimator that is sample complexity optimal for estimating entropy and distance to uniformity in broader parameter regimes. To motivate and understand this general framework we first define new generalizations of the profile, PML and approximate PML distributions.
Definition 3.3 (pseudo Profile).
For any sequence and , its pseudo profile denoted is where is the number of domain elements in with frequency in . We call the length of as it represents the length of the sequence from which this pseudo profile was constructed. Let denote the set of all pseudo profiles of length .
For any distribution , the probability of a pseudo profile is defined as:
(2) 
We next define the pseudo PML and approximate pseudo PML distributions that are analogous to the PML and approximate PML distributions.
Definition 3.4 (pseudo PML distribution).
For any pseudo profile , a distribution is a pseudo PML distribution if .
Definition 3.5 (approximate pseudo PML distribution).
For any profile , a distribution is a approximate pseudo PML distribution if .
For notational convenience, we also define the following function.
Definition 3.6.
For any subset , the function takes input a psuedo profile and returns the set with all distinct frequencies in .
Using the definitions above, we next give an interesting generalization of Theorem 3 in [ADO+16].
Theorem 3.7.
For a symmetric property f and , suppose there is an estimator , such that for any p and the following holds,
then for any , a approximate pseudo PML distribution satisfies:
Note that in the theorem above, the error probability with respect to a pseudo PML distribution based estimator has dependency on and . However Theorem 3 in [ADO+16] has error probability . This is the bottleneck in showing that PML works for all parameter regimes and the place where pseudo PML wins over the vanilla PML based estimator, getting nontrivial results for entropy and distance to uniformity. We next state our general framework for estimating symmetric properties. We use the idea of sample splitting which is now standard in the literature [WY16, JVH+15, JHW16, CL11, NEM03].
In the above general framework, the choice of depends on the symmetric property of interest. Later, in the case of entropy and distance to uniformity, we will choose to be the region where the empirical estimate fails; it is also the region that is difficult to estimate. One of the important properties of the above general framework is that (recall is a approximate pseudo PML distribution and is the property value of distribution on subset of domain elements ) is close to with high probability. Below we state this result formally.
Theorem 3.8.
For any symmetric property f, let and . If for all , there exists an estimator , such that for any p and satisfies,
(3) 
Then for any sequence ,
where is a random set and .
Using the theorem above, we already have a good estimate for for appropriately chosen frequency subsets and . Further, we choose these subsets and carefully so that the empirical estimate plus the correction bias with respect to is close to . Combining these together, we get the following results for entropy and distance to uniformity.
Theorem 3.9.
If error parameter for any constant , then for estimating entropy, the estimator 1 for is sample complexity optimal.
For entropy, we already know from [WY16] that the empirical distribution is sample complexity optimal if for some constant . Therefore the interesting regime for entropy estimation is when and our estimator works for almost all such .
Theorem 3.10.
Let and error parameter , then for estimating distance from uniformity, the estimator 1 for is sample complexity optimal.
Note that the estimator in [JHW17] also requires that the error parameter , where is some constant.
4 Analysis of General Framework for Symmetric Property Estimation
Here we provide proofs of the main results for our general framework (Theorem‘3.7 and 3.8). These results weakly depend on the property and generalize results in [ADO+16]. The PML based estimator in [ADO+16] is sample competitive only for a restricted error parameter regime and this stems from the large number of possible profiles of length . Our next lemma will be useful to address this issue and later we show how to use this result to prove Theorems 3.7 and 3.8.
Lemma 4.1.
For any subset and , if set is defined as , then the cardinality of set is upper bounded by .
Proof of creftypecap 3.7.
Using the law of total probability we have,
Consider any . If , then we know that . For , we have that implies . Further implies . Using triangle inequality we get, . Note we wish to upper bound the probability of set: . From the previous discussion, we get for all . Therefore,
In the final inequality, we use and invoke Lemma 4.1. ∎
Proof for creftypecap 3.8.
Using Bayes rule we have:
(4) 
In the second inequality, we use . Consider the first term on the right side of the above expression and note that it is upper bounded by, . In the first upper bound, we removed randomness associated with the random set and used . In the first inequality above, we invoke creftypecap 3.7 using conditions from Equation 3. In the second inequality, we use and . The theorem follows by combining all the analysis together. ∎
5 Applications of the General Framework
Here we provide applications of our general framework (defined in Section 3) using results from the previous section. We apply our general framework to estimate entropy and distance to uniformity. In Section 5.1 and Section 5.2 we analyze the performance of our estimator for entropy and distance to uniformity estimation respectively.
5.1 Entropy estimation
In order to prove our main result for entropy (creftypecap 3.9), we first need the existence of an estimator for entropy with some desired properties. The existence of such an estimator will be crucial to bound the failure probability of our estimator. A result analogous to this is already known in [ADO+16] (Lemma 2) and the proof of our result follows from a careful observation of [ADO+16, WY16]. We state this result here but defer the proof to appendix.
Lemma 5.1.
Let , and , then for entropy on subset () there exists an pseudo profile based estimator that use the optimal number of samples, has bias less than and if we change any sample, changes by at most , where is a constant.
Combining the above lemma with creftypecap 3.8, we next prove that our estimator defined in Algorithm 1 is sample complexity optimal for estimating entropy in a broader regime of error .
Proof for creftypecap 3.9.
Let represent the entropy of distribution p and be the estimator in Lemma 5.1. Define for constant . Given the sequence , the random set is defined as . Let , then by derivation in Lemma 6 [ADO+16] (or by simple application of Chernoff ^{4}^{4}4Note probability of many events in this proof can be easily bounded by application of Chernoff. These bounds on probabilities are also shown in [ADO+16, WY16] and we use these inequalities by omitting details.) we have,
Further let , then by Equation 48 in [WY16] we have, . Further for all we have,
Note for all , and the above inequality also follows from Chernoff. All that remains now is to upper bound . Using the estimator constructed in Lemma 5.1 and further combined with McDiarmid’s inequality, we have,
Substituting all these parameters together in creftypecap 3.8 we have,
(5) 
In the first inequality, we use creftypecap 3.8. In the second inequality, we substituted the values for and . In the final inequality we used and .
Our final goal is to estimate , and to complete the proof we need to argue that + the correction bias with respect to is close to , where recall is the empirical distribution on sequence . The proof for this follows immediately from [WY16] (Case 2 in the proof of Proposition 4). [WY16]
bound the bias and variance of the empirical estimator with a correction bias and applying Markov inequality on their result we get
, where is the correction bias in [WY16]. Using triangle inequality, our estimator fails if either or . Further by union bound the failure probability is at most , which is a constant. ∎5.2 Distance to Uniformity estimation
Here we prove our main result for distance to uniformity estimation (creftypecap 3.10). First, we show existence of an estimator for distance to uniformity with certain desired properties. Similar to entropy, a result analogous to this is shown in [ADO+16] (Lemma 2) and the proof of our result follows from the careful observation of [ADO+16, JHW17]. We state this result here but defer the proof to Appendix C.
Lemma 5.2.
Let and , then for distance to uniformity on () there exists an pseudo profile based estimator that use the optimal number of samples, has bias at most and if we change any sample, changes by at most , where is a constant.
Combining the above lemma with creftypecap 3.8 we provide the proof for creftypecap 3.10.
Proof for creftypecap 3.10.
Let represent the distance to uniformity for distribution p and be the estimator in Lemma 5.2. Define for some constant . Given the sequence , the random set is defined as . Let , then by derivation in Lemma 7 of [ADO+16] (also shown in [JHW17] ^{5}^{5}5Similar to entropy, for many events their probabilities can be bounded by simple application of Chernoff and have already been shown in [ADO+16, JHW17]. We omit details for these inequalities.) we have,
Further let , then using Lemma 2 in [JHW17] we get,
Further for all we have,
Note for all , and the above result follows from [JHW17] (Lemma 1). All that remains now is to upper bound . Using the estimator constructed in Lemma 5.2 and further combined with McDiarmid’s inequality, we have,
Substituting all these parameters in creftypecap 3.8 we get,
(6) 
In the first inequality, we use creftypecap 3.8. In the second inequality, we substituted values for and . In the final inequality we used and .
Our final goal is to estimate , and to complete the proof we argue that + correction bias with respect to is close to , where recall is the empirical distribution on sequence . The proof for this case follows immediately from [JHW17] (proof of Theorem 2). [JHW17] define three kinds of events and , the proof for our empirical case follows from the analysis of bias and variance of events and . Further combining results in [JHW17] with Markov inequality we get , and the correction bias here is zero. Using triangle inequality, our estimator fails if either or . Further by union bound the failure probability is upper bounded by , which is a constant. ∎
6 Experiments
We performed two different sets of experiments for entropy estimation – one to compare performance guarantees and the other to compare running times. In our pseudo PML approach, we divide the samples into two parts. We run the empirical estimate on one (this is easy) and the PML estimate on the other. For the PML estimate, any algorithm to compute an approximate PML distribution can be used in a black box fashion. An advantage of the pseudo PML approach is that it can use any algorithm to estimate the PML distribution as a black box, providing both competitive performance and running time efficiency. In our experiments, we use the heuristic algorithm in
[PJW17] to compute an approximate PML distribution. In the first set of experiments detailed below, we compare the performance of the pseudo PML approach with raw [PJW17] and other stateoftheart estimators for estimating entropy. Our code is available at https://github.com/shiragur/CodeForPseudoPML.gitEach plot depicts the performance of various algorithms for estimating entropy of different distributions with domain size . Each data point represents 50 random trials. “Mix 2 Uniforms” is a mixture of two uniform distributions, with half the probability mass on the first symbols, and with . MLE is the naive approach of using the empirical distribution with correction bias; all the remaining algorithms are denoted using bibliographic citations. In our algorithm we pick (same as [WY16]) and our set (input of Algorithm 1), i.e. we use the PML estimate on frequencies and empirical estimate on the rest. Unlike Algorithm 1, we do not perform sample splitting in the experiments – we believe this requirement is an artifact of our analysis. For estimating entropy, the error achieved by our estimator is competitive with [PJW17] and other stateoftheart entropy estimators. Note that our results match [PJW17] for small sample sizes because not many domain elements cross the threshold and for a large fraction of the samples, we simply run the [PJW17] algorithm.
In the second set of experiments we demonstrate the running time efficiency of our approach. In these experiments, we compare the running time of our algorithm using [PJW17] as a subroutine to the raw [PJW17] algorithm on the distribution. The second row is the fraction of samples on which our algorithm uses the empirical estimate (plus correction bias). The third row is the ratio of the running time of [PJW17] to our algorithm. For large sample sizes, the entries in the EmpFrac row have high value, i.e. our algorithm applies the simple empirical estimate on large fraction of samples; therefore, enabling x speedup in the running times.
Samples size  

EmpFrac  0.184  0.317  0.372  0.505  0.562  0.695  0.752  0.886 
Speedup  0.824  1.205  1.669  3.561  4.852  9.552  13.337  12.196 
Acknowledgments
We thank the reviewers for the helpful comments, great suggestions, and positive feedback. Moses Charikar was supported by a Simons Investigator Award, a Google Faculty Research Award and an Amazon Research Award. Aaron Sidford was partially supported by NSF CAREER Award CCF1844855.
References
 [ADO+16] (2016) A unified maximum likelihood approach for optimal distribution property estimation. CoRR abs/1611.02960. External Links: Link, 1611.02960 Cited by: Appendix A, Appendix A, Appendix A, §C.1, §C.1, §C.1, §C.2, §C.2, §C.2, §C.2, Appendix C, Appendix C, Appendix C, A General Framework for Symmetric Property Estimation, 1st item, §1.1, §1, §1, §1, §3, §3, §4, §5.1, §5.1, §5.2, §5.2, footnote 3, footnote 4, footnote 5.
 [AOS+14] (2014) The complexity of estimating rényi entropy. In Proceedings of the TwentySixth Annual ACMSIAM Symposium on Discrete Algorithms, pp. 1855–1869. External Links: Document, Link, https://epubs.siam.org/doi/pdf/10.1137/1.9781611973730.124 Cited by: §1.1.
 [CL11] (201104) Testing composite hypotheses, hermite polynomials and optimal estimation of a nonsmooth functional. Ann. Statist. 39 (2), pp. 1012–1041. External Links: Document, Link Cited by: §C.1, §C.2, §3.
 [CSS19] (201905) Efficient Profile Maximum Likelihood for Universal Symmetric Property Estimation. arXiv eprints, pp. arXiv:1905.08448. External Links: 1905.08448 Cited by: §1.
 [ET76] (1976) Estimating the number of unsen species: how many words did shakespeare know?. Biometrika 63 (3), pp. 435–447. External Links: ISSN 00063444, Link Cited by: §1.1.

[HJM17]
(201710)
On Estimation of $L_{r}$Norms in Gaussian White Noise Models
. arXiv eprints, pp. arXiv:1710.03863. External Links: 1710.03863 Cited by: §1.1.  [HJW+17] (201711) Optimal rates of entropy estimation over Lipschitz balls. arXiv eprints, pp. arXiv:1711.02141. External Links: 1711.02141 Cited by: §1.1.
 [HJW18] (2018) Local moment matching: a unified methodology for symmetric functional estimation and distribution estimation under wasserstein distance. arXiv preprint arXiv:1802.08405. Cited by: §1.1.
 [HO19a] (2019) Data amplification: instanceoptimal property estimation. External Links: 1903.01432 Cited by: §1.1.
 [HO19b] (2019) The broad optimality of profile maximum likelihood. External Links: 1906.03794 Cited by: §1.1.
 [JHW16] (201607) Minimax estimation of the l1 distance. In 2016 IEEE International Symposium on Information Theory (ISIT), Vol. , pp. 750–754. External Links: Document, ISSN Cited by: §1.1, §1.1, §1, §1, §3.
 [JVH+15] (201505) Minimax estimation of functionals of discrete distributions. IEEE Transactions on Information Theory 61 (5), pp. 2835–2885. External Links: Document, ISSN 00189448 Cited by: §1.1, §1, §1, §3.
 [JHW17] (201705) Minimax Estimation of the Distance. arXiv eprints, pp. arXiv:1705.00807. External Links: 1705.00807 Cited by: §C.2, §3, §5.2, §5.2, §5.2, footnote 5, footnote 6.
 [NEM03] (2003) On tractable approximations of randomly perturbed convex constraints. In 42nd IEEE International Conference on Decision and Control (IEEE Cat. No. 03CH37475), Vol. 3, pp. 2419–2422. Cited by: §3.
 [OSS+04] (2004) Algorithms for modeling distributions over large alphabets. In International Symposium on Information Theory, 2004. ISIT 2004. Proceedings., Vol. , pp. 304–304. External Links: Document, ISSN Cited by: §1.1.
 [OSW16] (2016) Optimal prediction of the number of unseen species. Proceedings of the National Academy of Sciences 113 (47), pp. 13283–13288. External Links: Document, ISSN 00278424, Link, http://www.pnas.org/content/113/47/13283.full.pdf Cited by: §1.1.
 [PJW17] (201712) Approximate Profile Maximum Likelihood. ArXiv eprints. External Links: 1712.07177 Cited by: §6, §6, §6.
 [RVZ17] (2017) Estimating the unseen from multiple populations. CoRR abs/1707.03854. External Links: Link, 1707.03854 Cited by: §1.1.
 [RRS+07] (200710) Strong lower bounds for approximating distribution support size and the distinct elements problem. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), Vol. , pp. 559–569. External Links: Document, ISSN 02725428 Cited by: §1.1.
 [TIM14] (2014) Theory of approximation of functions of a real variable. Vol. 34, Elsevier. Cited by: §C.1, §C.2.
 [VV11] (201110) The power of linear estimators. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, Vol. , pp. 403–412. External Links: Document, ISSN 02725428 Cited by: §1.1, §1.1, §1.

[VV11]
(2011)
Estimating the unseen: an n/log(n)sample estimator for entropy and support size, shown optimal via new clts.
In
Proceedings of the Fortythird Annual ACM Symposium on Theory of Computing
, STOC ’11, New York, NY, USA, pp. 685–694. External Links: ISBN 9781450306911, Link, Document Cited by: §1.1, §1.1, §1.  [WY15] (201504) Chebyshev polynomials, moment matching, and optimal estimation of the unseen. ArXiv eprints. External Links: 1504.01227 Cited by: §1.1, §1.1, §1.
 [WY16] (201606) Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory 62 (6), pp. 3702–3720. External Links: Document, ISSN 00189448 Cited by: §C.1, §C.1, §C.1, §1.1, §1.1, §1, §1, §3, §3, §5.1, §5.1, §5.1, §6, footnote 4.
 [WY16] (201612) Sample complexity of the distinct elements problem. arXiv eprints, pp. arXiv:1612.03375. External Links: 1612.03375 Cited by: §1.1.
 [YOS+18] (2018) Data amplification: a unified and competitive approach to property estimation. In Advances in Neural Information Processing Systems, pp. 8834–8843. Cited by: §1.1.
 [ZVV+16] (2016/10/31/online) Quantifying unobserved proteincoding variants in human populations provides a roadmap for largescale sequencing projects. Nature Communications 7, pp. 13293 EP –. External Links: Link Cited by: