1 Introduction
Let us consider the statistical modeling of a random sampling that produces data containing records by drawing sets of elements with replacement, where and are positive integers. Let
be random variables corresponding
-th element in -th sample. For example, when we are interested in the population meanand the normal distribution
with unit variance is used as a population model, the sample mean
of records, which is a complete sufficient statistic, coincides with the sample mean of sample means , which are calculated from each sample set of size . There exists a relationand both samples have the same information for . Under the same setting, if we use the sample median of records to estimate the population mean (which is identical to the population median in the case of ), which is not a sufficient statistic, have different information from one of which the sample mean of sample medians , which are calculated from each sample set of size . This implies that when to pool affects the efficiency of inference. However, both of the asymptotic distributions of and as are , which indicates that they asymptotically have the same Fisher information, and so the efficiencies of inference asymptotically coincide.
In this paper, we report a more curious phenomenon that the magnitude relation of the Fisher information reverses depending on the values of parameters. For the purpose of inferring the population diversity, the Poisson–Dirichlet distribution (Kingman, 1975) with parameter is assumed as a statistical model of the descending order population frequency. If is known, the homozygosity (
), the probability of discovering a new species (
where is the sample size), the expected number of singletons (population uniques) in a finite population ( where is the population size), and so on can be determined. Moreover, the Poisson–Dirichlet distribution prior (or the Dirichlet process prior) plays an important role in the nonparametric Bayesian estimation when there exist possibly infinite labels. For example, see Crane (2016) and its discussions; Ewens (1972); Hoshino and Takemura (1998); Samuels (1998). Therefore, estimating efficiently is a problem and we will show that sampling procedure influence the magnitude relation of the Fisher information, so our result can be applied to considering sampling procedures for estimating efficiently.Remark 1.
Professor Masaaki Sibuya pointed out that our result represents an aspects of differences between independently and identically distributed (iid) sequences and non-iid sequences (in this paper exchangeable sequences are considered).
2 Sampling from the Poisson–Dirichlet distribution
2.1 Sampling procedures
Assume that the descending order frequency of the infinite population follows the Poisson–Dirichlet distribution with parameter . For details about the properties of the Poisson–Dirichlet distribution and related distributions, we refer the reader to Feng (2010). In this paper, the following two types of samples are considered:
-
Sample (i)
Drawing samples of elements with replacement. The numbers of distinct components in the respective sample partitions are denoted by . -
Sample (ii)
Drawing a single sample of elements. The number of distinct components in the sample partition is denoted by .
If a sample of elements is drawn from a population that follows the Poisson–Dirichlet distribution with parameter , then we have a sample frequency. When the population diversity is in interest, we convert the sample frequency to a sample partition, and the sample partition follows the Ewens sampling formula with parameter . Then, the number of distinct components in the sample partition, which is the number of labels in the sample, follows the falling factorial distribution with parameter (this distribution is described in the next subsection). Note that the number of distinct components in the sample is the complete sufficient statistic for . Hence, independently and identically follow the falling factorial distribution with parameter , and follows the falling factorial distribution with parameter .
Even though the total numbers of elements are in both sampling processes, Samples (i) and (ii) contain different information. In this paper, we calculate the Fisher information acquired from the samples obtained using these two sampling processes and demonstrate that the magnitude relation of the Fisher information can be reversed based on the values of the parameters , and .
In the literature, there is another model called the multi-observer ESF (Barbour and Tavaré, 2010), where the 2-observer version is considered in Ewens et al. (2007). In the model, a sample of size consists of subsamples whose sizes are where . The motivation of Barbour and Tavaré (2010) is testing of the spatial homogeneity of tumors, which means the parameters are common in the left side and right side of a tumor. In the tumor example, when a common underlying evolution process is considered, Sample (i) fits the situation because the population frequencies are considered to be different even in the same side (see the data in Table 2.1 of Barbour and Tavaré (2010), or its original methylation patterns data in Fig 2 of Siegmund et al. (2009)).
Remark 2.
As for the multi-observer ESF, the marginal distributions of subsamples are also the Ewens sampling formula. Even if subsamples in the whole subsamples are considered with , subsamples are not regarded as independent. However, they are exchangeable, and conditionally iid when the population frequency is fixed to one realization. On the other hand, our Sample (i) corresponds to the case where there are different population frequencies.
Remark 3.
In the context of microdata disclosure risk assessments, the Ewens sampling formula has been used as a model for the frequency of frequencies, which is sometimes called the size index. See, for example, Hoshino and Takemura (1998) and Samuels (1998). The problem considered here fits the situation where an investigation is conducted in areas and individuals are surveyed in each area with an assumption on the Poisson–Dirichlet hyperpopulation. Sample (i) corresponds to the case where there are different population frequencies for each area, and Sample (ii) corresponds to the case where there is only one population frequency. When collecting survey data, it is natural to consider that sample sizes are different area-by-area. However, to clearly observe the reversal phenomenon of the Fisher information, we assume that the sample sizes of all areas are the same.
Remark 4.
Consider there exists possibly infinite number of labels, and we have sample frequency with finite number of labels. In such situation, the Poisson-Dirichlet distribution can be used as a prior distribution, and in order to conduct empirical Bayes estimation its parameter should be estimated. Suppose there are samples of size . If the samples are drawn from different populations which independently follow the Poisson–Dirichlet distribution then Sample (i) is suitable; on the other hand if the samples are drawn from one population which follows the Poisson–Dirichlet distribution then Sample (ii) is suitable.
2.2 The falling factorial distribution
If the probability distribution of a positive-integer-valued random variable
is given bythen the distribution of is sometimes called the falling factorial distribution with parameter (Watterson, 1974), where and is the coefficient of in . The falling factorial distribution is also called STR1F in Sibuya (1988). In a random partition following the Ewens sampling formula with parameter , the number of distinct components follows the falling factorial distribution with parameter (Ewens, 1972). Since the number of distinct components is the complete sufficient statistic for , considering the falling factorial distribution is enough in order to estimate . In this subsection, we describe the known properties of a random variable which follows the falling factorial distribution with parameter .
The moment generating function of
is given aswhere is the gamma function. Letting
the mean and variance of can be written as
Let
Then, the log-likelihood is and its derivative with respect to is
Since , the Fisher information of is
The maximum likelihood estimator of is the root of
It is well known that if is fixed then enjoys the asymptotic normality as :
as , where denotes the convergence in distribution. Moreover, it may be useful for some situation to consider the case where grows with (Feng, 2007; Griffiths and Tavaré, 2017; Tsukuda, 2017). In particular, the asymptotic normality of has been extended to the asymptotic case where both and tend to infinity (Tsukuda, 2017):
as with , or as leaving fixed. The asymptotic variance of generally does not become . Indeed if then and if then , where and grow simultaneously. Therefore, it holds that if then ; further, if then .
3 The Fisher information of two samples
In this section, the Fisher information of Samples (i) and (ii) is calculated, and the maximum likelihood estimators are presented.
3.1 Sample (i)
Since independently and identically follow the falling factorial distribution with a parameter , the likelihood is given by
Thus, the log-likelihood is
so its derivative with respect to is
It follows from and from the independence of that the Fisher information with respect to is
The maximum likelihood estimator is the root of
where denotes the sample mean of where
(3.1) |
Remark 5.
The moment generating function of is
(3.2) |
Therefore, when , does not follow the falling factorial distribution. Moreover, when , if and only if ,
(see Proposition 5 in Appendix).
3.2 Sample (ii)
Since follows the falling factorial distribution with parameter , the Fisher information with respect to is given as
The maximum likelihood estimator is the root of
4 Comparing the two samples
As is stated before, the Fisher information of Samples (i) and (ii) – and , respectively– are different even though the total numbers of elements in the two Samples are the same. Considering simple asymptotic settings, the asymptotic magnitude relation between and can be observed as follows: If leaving and fixed, then
so asymptotically; if leaving and fixed, then
so asymptotically. On the contrary, if and , , so . These observations indicate that the magnitude relation can be reversed based on the values of the parameters and . To demonstrate this phenomenon more precisely, non-asymptotic sufficient conditions that guarantee the magnitude relation between and are provided in Subsection 4.1. Moreover, the asymptotic conditions that guarantee the asymptotic magnitude relation where a pair of parameters either , , or , grows simultaneously leaving the rest parameter fixed are provided in Subsection 4.2.
4.1 Non-asymptotic results
In this subsection, let us provide sufficient conditions under which and a sufficient condition under which . The results indicate that, roughly speaking, the former inequality holds when is relatively small in comparison to and , and the latter inequality holds when is relatively large in comparison to that relative to and .
Firstly, we provide a sufficient condition under which .
Theorem 1.
(1) When , if integers and satisfy and
(4.1) |
then
.
(2) When , if integers and satisfy and
(4.2) |
then .
Proof.
First, note that is decreasing as increases in the range because its derivative is . Moreover, either the hypothesis of (1) or (2) implies , since
where
are used in the first inequality and
Therefore,
(4.4) |
Hereafter, the assertions (1) and (2) are considered separately.
(1) When , the right-hand side of (4.4) is larger than
Further, if the hypothesis is true then this display is positive, because the terms enclosed in the brackets are
(2) When , the right-hand side of (4.4) is larger than
Further, if the hypothesis is true then this display is positive, because the terms enclosed in the brackets are
This completes the proof. ∎
Remark 6.
13.50 | 17.54 | 22.32 | 27.30 | 52.83 | 518.53 | ||
12.00 | 16.37 | 21.26 | 26.29 | 51.90 | 517.66 | ||
9.90 | 14.74 | 19.77 | 24.87 | 50.61 | 516.44 |
If then (4.2) is satisfied for and . Therefore, we have the following corollary.
Corollary 2.
If , then for any integers and .
Next, we provide another sufficient condition under which .
Theorem 3.
If
(4.6) |
then for any integers and .
Proof.
From
and
(see the proof of Proposition 1 in Tsukuda (2017) for derivations of these inequalities), it follows that
When , this display implies that
(4.7) | |||||
(4.8) |
On the other hand, when , it holds that
(4.9) |
Hence, (4.8) and (4.9) imply that
where the last inequality follows from the assumption (4.6). This completes the proof. ∎
Remark 8.
By assuming more direct condition, we may have sharper sufficient conditions. For example, if
(4.10) |
then . That is because
where we have used
for positive integers . However, interpreting (4.10) in the current form may be difficult.
Finally, we provide a sufficient condition under which .
Theorem 4.
If
(4.11) |
then for any integers and .
Proof.
Define the function as
First, when , it holds that for . Next, consider . Since , it is sufficient to show that when (4.11) is satisfied. It holds that
where the numerator of the fraction contained in the brackets on the right-hand side is negative since
Therefore it holds that , which implies for . This completes the proof. ∎
4.2 Asymptotic results
In this subsection, we consider situations where two of and tend to infinity simultaneously leaving the rest one fixed. The results presented in this subsection are summarized in Table 2, where and are denoted by and , respectively.
asymptotic setting | magnitude relation | ||
---|---|---|---|
(I) , :fixed | |||
(II) , :fixed | |||
(III) , :fixed | case-by-case | ||
(IV) , :fixed | |||
(V) , :fixed | |||
(VI) , :fixed |
Six different cases with various asymptotic settings are considered as follows:
(III) , leaving fixed
Since
there are three possible cases that differ in terms of the asymptotic magnitude relation between and : (a) If then asymptotically, which corresponds to Theorem 3 (see Remark 7). (b) If then asymptotically, which partly corresponds to Theorem 4. (c) If then
where ; therefore, in this case, the asymptotic magnitude relation between and is determined by the magnitude relation between and .
(IV) , leaving fixed
Since
and
which follows from
it holds that asymptotically.
This corresponds to Theorem 4.
5 Concluding remarks
Under the assumption of the Poisson–Dirichlet population, we have presented that two ways (i) and (ii) of sampling procedures have different information, that is to say, they lead totally different results. The reason of this phenomenon is that the Ewens sampling formula represents the law of samples from the Poisson–Dirichlet population, a typical random discrete distribution. By virtue of the de Finetti theorem, for an exchangeable sequence there exists a directing measure such that the sequence is conditionally iid. Our result indicates that when data analyses using random distributions are conducted, it is crucial to decide whether the data in interest is sample from an unobservable identical directing measure or not.
Appendix A Appendix: The asymptotic normality of
In this Appendix, the following proposition, which was mentioned in Remark 5, is proven.
Proposition 5.
Proof.
For , let
Consider a triangular sequence of independent Bernoulli variables, where for all . Then, from (3.2), the distribution of is the same as the distribution of
First, we prove that (A.1) implies
(A.3) |
From the central limit theorem for bounded random variables, it is sufficient to show that
. When or , it follows that . When , it follows from (A.1) thatNext, we prove that when (A.3) implies (A.1). Assume that (A.1) does not hold. Consider such that . Since and
for all which follows from the fact that is increasing for , it holds that
Hence, in order for (A.3) to hold, the Lindeberg condition
(A.4) |
for any is necessary where is the indicator function, so we see that (A.4) does not hold. Since
we can take
Then, we have
for all , which yields that