A reversal phenomenon in estimation based on multiple samples from the Poisson--Dirichlet distribution

Consider two forms of sampling from a population: (i) drawing s samples of n elements with replacement and (ii) drawing a single sample of ns elements. In this paper, under the setting where the descending order population frequency follows the Poisson--Dirichlet distribution with parameter θ, we report that the magnitude relation of the Fisher information, which sample partitions converted from samples (i) and (ii) possess, can change depending on the parameters, n, s, and θ. Roughly speaking, if θ is small relative to n and s, the Fisher information of (i) is larger than that of (ii); on the contrary, if θ is large relative to n and s, the Fisher information of (ii) is larger than that of (i). The result represents one aspect of random distributions.

Authors

• 4 publications
• 4 publications
05/12/2020

Fisher-Rao geometry of Dirichlet distributions

In this paper, we study the geometry induced by the Fisher-Rao metric on...
11/12/2021

Active information requirements for fixation on the Wright-Fisher model of population genetics

In the context of population genetics, active information can be extende...
01/28/2019

Exact Good-Turing characterization of the two-parameter Poisson-Dirichlet superpopulation model

Large sample size equivalence between the celebrated approximated Good-...
11/17/2020

Fisher Information of a Family of Generalized Normal Distributions

In this brief note we compute the Fisher information of a family of gene...
10/21/2020

The entropy based goodness of fit tests for generalized von Mises-Fisher distributions and beyond

We introduce some new classes of unimodal rotational invariant direction...
06/06/2019

Blackwell dominance in large samples

We study repeated independent Blackwell experiments; standard examples i...
04/17/2018

VC-Dimension Based Generalization Bounds for Relational Learning

In many applications of relational learning, the available data can be s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let us consider the statistical modeling of a random sampling that produces data containing records by drawing sets of elements with replacement, where and are positive integers. Let

be random variables corresponding

-th element in -th sample. For example, when we are interested in the population mean

and the normal distribution

with unit variance is used as a population model, the sample mean

of records, which is a complete sufficient statistic, coincides with the sample mean of sample means , which are calculated from each sample set of size . There exists a relation

 ns¯Z⋅⋅=ns∑i=1¯Zi⋅,

and both samples have the same information for . Under the same setting, if we use the sample median of records to estimate the population mean (which is identical to the population median in the case of ), which is not a sufficient statistic, have different information from one of which the sample mean of sample medians , which are calculated from each sample set of size . This implies that when to pool affects the efficiency of inference. However, both of the asymptotic distributions of and as are , which indicates that they asymptotically have the same Fisher information, and so the efficiencies of inference asymptotically coincide.

In this paper, we report a more curious phenomenon that the magnitude relation of the Fisher information reverses depending on the values of parameters. For the purpose of inferring the population diversity, the Poisson–Dirichlet distribution (Kingman, 1975) with parameter is assumed as a statistical model of the descending order population frequency. If is known, the homozygosity (

), the probability of discovering a new species (

where is the sample size), the expected number of singletons (population uniques) in a finite population ( where is the population size), and so on can be determined. Moreover, the Poisson–Dirichlet distribution prior (or the Dirichlet process prior) plays an important role in the nonparametric Bayesian estimation when there exist possibly infinite labels. For example, see Crane (2016) and its discussions; Ewens (1972); Hoshino and Takemura (1998); Samuels (1998). Therefore, estimating efficiently is a problem and we will show that sampling procedure influence the magnitude relation of the Fisher information, so our result can be applied to considering sampling procedures for estimating efficiently.

Remark 1.

Professor Masaaki Sibuya pointed out that our result represents an aspects of differences between independently and identically distributed (iid) sequences and non-iid sequences (in this paper exchangeable sequences are considered).

2 Sampling from the Poisson–Dirichlet distribution

2.1 Sampling procedures

Assume that the descending order frequency of the infinite population follows the Poisson–Dirichlet distribution with parameter . For details about the properties of the Poisson–Dirichlet distribution and related distributions, we refer the reader to Feng (2010). In this paper, the following two types of samples are considered:

• Sample (i)
Drawing samples of elements with replacement. The numbers of distinct components in the respective sample partitions are denoted by .

• Sample (ii)
Drawing a single sample of elements. The number of distinct components in the sample partition is denoted by .

If a sample of elements is drawn from a population that follows the Poisson–Dirichlet distribution with parameter , then we have a sample frequency. When the population diversity is in interest, we convert the sample frequency to a sample partition, and the sample partition follows the Ewens sampling formula with parameter . Then, the number of distinct components in the sample partition, which is the number of labels in the sample, follows the falling factorial distribution with parameter (this distribution is described in the next subsection). Note that the number of distinct components in the sample is the complete sufficient statistic for . Hence, independently and identically follow the falling factorial distribution with parameter , and follows the falling factorial distribution with parameter .

Even though the total numbers of elements are in both sampling processes, Samples (i) and (ii) contain different information. In this paper, we calculate the Fisher information acquired from the samples obtained using these two sampling processes and demonstrate that the magnitude relation of the Fisher information can be reversed based on the values of the parameters , and .

In the literature, there is another model called the multi-observer ESF (Barbour and Tavaré, 2010), where the 2-observer version is considered in Ewens et al. (2007). In the model, a sample of size consists of subsamples whose sizes are where . The motivation of Barbour and Tavaré (2010) is testing of the spatial homogeneity of tumors, which means the parameters are common in the left side and right side of a tumor. In the tumor example, when a common underlying evolution process is considered, Sample (i) fits the situation because the population frequencies are considered to be different even in the same side (see the data in Table 2.1 of Barbour and Tavaré (2010), or its original methylation patterns data in Fig 2 of Siegmund et al. (2009)).

Remark 2.

As for the multi-observer ESF, the marginal distributions of subsamples are also the Ewens sampling formula. Even if subsamples in the whole subsamples are considered with , subsamples are not regarded as independent. However, they are exchangeable, and conditionally iid when the population frequency is fixed to one realization. On the other hand, our Sample (i) corresponds to the case where there are different population frequencies.

Remark 3.

In the context of microdata disclosure risk assessments, the Ewens sampling formula has been used as a model for the frequency of frequencies, which is sometimes called the size index. See, for example, Hoshino and Takemura (1998) and Samuels (1998). The problem considered here fits the situation where an investigation is conducted in areas and individuals are surveyed in each area with an assumption on the Poisson–Dirichlet hyperpopulation. Sample (i) corresponds to the case where there are different population frequencies for each area, and Sample (ii) corresponds to the case where there is only one population frequency. When collecting survey data, it is natural to consider that sample sizes are different area-by-area. However, to clearly observe the reversal phenomenon of the Fisher information, we assume that the sample sizes of all areas are the same.

Remark 4.

Consider there exists possibly infinite number of labels, and we have sample frequency with finite number of labels. In such situation, the Poisson-Dirichlet distribution can be used as a prior distribution, and in order to conduct empirical Bayes estimation its parameter should be estimated. Suppose there are samples of size . If the samples are drawn from different populations which independently follow the Poisson–Dirichlet distribution then Sample (i) is suitable; on the other hand if the samples are drawn from one population which follows the Poisson–Dirichlet distribution then Sample (ii) is suitable.

2.2 The falling factorial distribution

If the probability distribution of a positive-integer-valued random variable

is given by

 pr(X=x)=f(x,θ)=¯s(n,x)θx(θ)n,(x=1,2,…,n),

then the distribution of is sometimes called the falling factorial distribution with parameter (Watterson, 1974), where and is the coefficient of in . The falling factorial distribution is also called STR1F in Sibuya (1988). In a random partition following the Ewens sampling formula with parameter , the number of distinct components follows the falling factorial distribution with parameter (Ewens, 1972). Since the number of distinct components is the complete sufficient statistic for , considering the falling factorial distribution is enough in order to estimate . In this subsection, we describe the known properties of a random variable which follows the falling factorial distribution with parameter .

is given as

 E[eXt]=n∑i=1¯s(n,i)(θet)i(θ)n=(θet)n(θ)n=Γ(θet+n)Γ(θ)Γ(θ+n)Γ(θet),

where is the gamma function. Letting

 Ln(θ)=n∑i=11θ+i−1,ℓn(θ)=n∑i=1i−1(θ+i−1)2,

the mean and variance of can be written as

 E[X]=θLn(θ),\rm var(X)=θℓn(θ).

Let

 lθ(x)=log(f(x,θ))=log(¯s(n,x)θx(θ)n).

Then, the log-likelihood is and its derivative with respect to is

 ˙lθ(X)=∂∂θlθ(X)=Xθ−n∑i=11θ+i−1.

Since , the Fisher information of is

 E[(˙lθ(X))2]=1θ2% \rm var(X)=ℓn(θ)θ.

The maximum likelihood estimator of is the root of

 X−n∑i=1^θ^θ+i−1=0.

It is well known that if is fixed then enjoys the asymptotic normality as :

 √logn(^θ−θ)⇒N(0,θ)

as , where denotes the convergence in distribution. Moreover, it may be useful for some situation to consider the case where grows with (Feng, 2007; Griffiths and Tavaré, 2017; Tsukuda, 2017). In particular, the asymptotic normality of has been extended to the asymptotic case where both and tend to infinity (Tsukuda, 2017):

 √ℓn(θ)θ(^θ−θ)⇒N(0,1)

as with , or as leaving fixed. The asymptotic variance of generally does not become . Indeed if then and if then , where and grow simultaneously. Therefore, it holds that if then ; further, if then .

3 The Fisher information of two samples

In this section, the Fisher information of Samples (i) and (ii) is calculated, and the maximum likelihood estimators are presented.

3.1 Sample (i)

Since independently and identically follow the falling factorial distribution with a parameter , the likelihood is given by

 s∏k=1f(Xk,θ).

Thus, the log-likelihood is

 Mθ=s∑k=1lθ(Xk),

so its derivative with respect to is

 ˙Mθ=s∑k=1˙lθ(Xk)=s∑k=1(Xkθ−n∑i=11θ+i−1)=s∑k=1Xkθ−n∑i=1sθ+i−1.

It follows from and from the independence of that the Fisher information with respect to is

 I1(θ;n,s)=E[(˙Mθ)2]=\rm var(˙Mθ)=s∑k=1\rm var(˙lθ(Xk))=sℓn(θ)θ.

The maximum likelihood estimator is the root of

 ¯X−n∑i=1^θ(1)^θ(1)+i−1=0,

where denotes the sample mean of where

 T=s∑i=1Xi. (3.1)
Remark 5.

The moment generating function of is

 E[eTt]=((θet)n(θ)n)s=n∏j=1(θθ+j−1et+j−1θ+j−1)s. (3.2)

Therefore, when , does not follow the falling factorial distribution. Moreover, when , if and only if ,

 T−sθLn(θ)√sθℓn(θ)⇒N(0,1)

(see Proposition 5 in Appendix).

3.2 Sample (ii)

Since follows the falling factorial distribution with parameter , the Fisher information with respect to is given as

 I2(θ;n,s)=1θns∑i=1i−1(θ+i−1)2=ℓns(θ)θ.

The maximum likelihood estimator is the root of

 Y−ns∑i=1^θ(2)^θ(2)+i−1=0.

4 Comparing the two samples

As is stated before, the Fisher information of Samples (i) and (ii) – and , respectively– are different even though the total numbers of elements in the two Samples are the same. Considering simple asymptotic settings, the asymptotic magnitude relation between and can be observed as follows: If leaving and fixed, then

 I1(θ;n,s)∼slognθ,I2(θ;n,s)∼lognθ,

so asymptotically; if leaving and fixed, then

 I1(θ;n,s)=sℓn(θ)θ,I2(θ;n,s)∼logsθ,

so asymptotically. On the contrary, if and , , so . These observations indicate that the magnitude relation can be reversed based on the values of the parameters and . To demonstrate this phenomenon more precisely, non-asymptotic sufficient conditions that guarantee the magnitude relation between and are provided in Subsection 4.1. Moreover, the asymptotic conditions that guarantee the asymptotic magnitude relation where a pair of parameters either , , or , grows simultaneously leaving the rest parameter fixed are provided in Subsection 4.2.

4.1 Non-asymptotic results

In this subsection, let us provide sufficient conditions under which and a sufficient condition under which . The results indicate that, roughly speaking, the former inequality holds when is relatively small in comparison to and , and the latter inequality holds when is relatively large in comparison to that relative to and .

Firstly, we provide a sufficient condition under which .

Theorem 1.

(1) When , if integers and satisfy and

 n>1+⌊θ⌋−1+1sℓ⌊θ⌋(θ), (4.1)

then .
(2) When , if integers and satisfy and

 n>1+(θ+1)2(1s+1), (4.2)

then .

Proof.

First, note that is decreasing as increases in the range because its derivative is . Moreover, either the hypothesis of (1) or (2) implies , since

 1+⌊θ⌋−1+1sℓ⌊θ⌋(θ)−θ > 1+⌊θ⌋−1+1s1/2−θ=2⌊θ⌋−1+2s−θ ≥ ⌊θ⌋−2+2s,

where

 ℓ⌊θ⌋(θ)<⌊θ⌋(⌊θ⌋−1)2θ2<θ(θ−1)2θ2<12

are used in the first inequality and

 1+(θ+1)2(1s+1)>θ.

Therefore,

 sℓn(θ)−ℓns(θ) = (s−1)n∑i=2i−1(θ+i−1)2−ns−s+1∑i=n+1i−1(θ+i−1)2−ns∑i=ns−s+2i−1(θ+i−1)2 > (s−1)n∑i=2i−1(θ+i−1)2−(ns−n−s+1)n(θ+n)2 −(s−1)ns−s+1(θ+ns−s+1)2 = (s−1)[n∑i=2{i−1(θ+i−1)2−n(θ+n)2}−ns−s+1(θ+ns−s+1)2]. (4.4)

Hereafter, the assertions (1) and (2) are considered separately.

(1) When , the right-hand side of (4.4) is larger than

 (s−1)⎡⎣⌊θ⌋∑i=2{i−1(θ+i−1)2−n(θ+n)2}−ns−s+1(θ+ns−s+1)2⎤⎦.

Further, if the hypothesis is true then this display is positive, because the terms enclosed in the brackets are

 ⌊θ⌋∑i=2{i−1(θ+i−1)2−n(θ+n)2}−ns−s+1(θ+ns−s+1)2 = ℓ⌊θ⌋(θ)−(⌊θ⌋−1)n(θ+n)2−ns−s+1(θ+ns−s+1)2 > ℓ⌊θ⌋(θ)−(⌊θ⌋−1)1n−1−1s(n−1) = ℓ⌊θ⌋(θ)−(⌊θ⌋−1+1s)1n−1.

(2) When , the right-hand side of (4.4) is larger than

 (s−1){1(θ+1)2−n(θ+n)2−ns−s+1(θ+ns−s+1)2}.

Further, if the hypothesis is true then this display is positive, because the terms enclosed in the brackets are

 1(θ+1)2−n(θ+n)2−ns−s+1(θ+ns−s+1)2 > 1(θ+1)2−1n−1−1s(n−1) = 1(θ+1)2−(1+1s)1n−1.

This completes the proof. ∎

Remark 6.

To evaluate (4.1), some values of

 1+⌊θ⌋−1+1sℓ⌊θ⌋(θ) (4.5)

are shown in Table 1. The condition given in Theorem 1 is only a sufficient condition; when , .

If then (4.2) is satisfied for and . Therefore, we have the following corollary.

Corollary 2.

If , then for any integers and .

Next, we provide another sufficient condition under which .

Theorem 3.

If

 1≤θ≤(slog(1+ns))1/2−1, (4.6)

then for any integers and .

Proof.

From

 ℓn(θ)=Ln(θ)−θn∑i=11(θ+i−1)2,
 n∑i=11(θ+i−1)2≥nθ(n+θ),

and

 Ln(θ)

(see the proof of Proposition 1 in Tsukuda (2017) for derivations of these inequalities), it follows that

 ℓn(θ)

When , this display implies that

 ℓns(θ) < log(1+nsθ) (4.7) ≤ log(1+ns). (4.8)

On the other hand, when , it holds that

 sℓn(θ)≥s(θ+1)2. (4.9)

Hence, (4.8) and (4.9) imply that

 sℓn(θ)−ℓns(θ)>s(θ+1)2−log(1+ns)≥0,

where the last inequality follows from the assumption (4.6). This completes the proof. ∎

Remark 7.

By using (4.7) instead of (4.8), we can loose (4.6) to

 1≤θ≤(slog(1+ns/θ))1/2−1.
Remark 8.

By assuming more direct condition, we may have sharper sufficient conditions. For example, if

 ℓn(θ)>logss+1−nθ(n+1+θ)(ns+1+θ) (4.10)

then . That is because

 sℓn(θ)−ℓns(θ) > (s−1)ℓn(θ)+log(n+θns+θ)+(s−1)nθ(n+1+θ)(ns+1+θ) > (s−1){ℓn(θ)+nθ(n+1+θ)(ns+1+θ)−logss+1},

where we have used

 b∑i=ai(i+θ)2 = b∑i=a{1i+θ−θ(i+θ)2} < log(b+θa−1+θ)−θ(b−a+1)(a+θ)(b+1+θ)

for positive integers . However, interpreting (4.10) in the current form may be difficult.

Finally, we provide a sufficient condition under which .

Theorem 4.

If

 θ>√(n−1)(ns+n−1), (4.11)

then for any integers and .

Proof.

Define the function as

 g(s)=sℓn(θ)−ℓns(θ).

First, when , it holds that for . Next, consider . Since , it is sufficient to show that when (4.11) is satisfied. It holds that

 g(s+1)−g(s) = ℓn(θ)−ℓn(s+1)(θ)+ℓns(θ) = n∑i=1i−1(θ+i−1)2−ns+n∑i=ns+1i−1(θ+i−1)2 = n∑i=1{i−1(θ+i−1)2−ns+i−1(θ+ns+i−1)2} = n∑i=1{(i−1)(θ+ns+i−1)2−(ns+i−1)(θ+i−1)2(θ+i−1)2(θ+ns+i−1)2},

where the numerator of the fraction contained in the brackets on the right-hand side is negative since

 ns{−θ2+(i−1)(ns+i−1)}≤ns{−θ2+(n−1)(ns+n−1)}<0.

Therefore it holds that , which implies for . This completes the proof. ∎

4.2 Asymptotic results

In this subsection, we consider situations where two of and tend to infinity simultaneously leaving the rest one fixed. The results presented in this subsection are summarized in Table 2, where and are denoted by and , respectively.

Six different cases with various asymptotic settings are considered as follows:

(I) , leaving fixed
Since

 sℓn(θ)∼slog(nθ),ℓns(θ)∼log(nθ),

it follows that asymptotically. This corresponds to Theorem 1 since as .

(II) , leaving fixed
Since

 sℓn(θ)∼sn22θ2,ℓns(θ)∼n2s22θ2,

it follows that asymptotically. This corresponds to Theorem 4.

(III) , leaving fixed
Since

 sℓn(θ)∼sn22θ2,ℓns(θ)∼log(sθ),

there are three possible cases that differ in terms of the asymptotic magnitude relation between and : (a) If then asymptotically, which corresponds to Theorem 3 (see Remark 7). (b) If then asymptotically, which partly corresponds to Theorem 4. (c) If then

 sℓn(θ)∼n2K2logθ,ℓns(θ)∼logθ,

where ; therefore, in this case, the asymptotic magnitude relation between and is determined by the magnitude relation between and .

(IV) , leaving fixed
Since

 sℓn(θ)∼sn22θ2→0

and

 liminfs,θℓns(θ)>liminfs,θ{ns(ns−1)2θ21(1+ns/θ)2}>0,

which follows from

 ℓns(θ)>1(θ+ns)2ns∑i=1(j−1)=ns(ns−1)2θ21(1+ns/θ)2,

it holds that asymptotically. This corresponds to Theorem 4.

(V) , leaving fixed
Since

 sℓn(θ)∼sn22θ2,ℓns(θ)∼n2s22θ2,

it follows that asymptotically. This corresponds to Theorem 4.

(VI) , leaving fixed
Since

 sℓn(θ)∼slogn,ℓns(θ)∼log(ns),

it follows that asymptotically. This corresponds to Theorem 1.

5 Concluding remarks

Under the assumption of the Poisson–Dirichlet population, we have presented that two ways (i) and (ii) of sampling procedures have different information, that is to say, they lead totally different results. The reason of this phenomenon is that the Ewens sampling formula represents the law of samples from the Poisson–Dirichlet population, a typical random discrete distribution. By virtue of the de Finetti theorem, for an exchangeable sequence there exists a directing measure such that the sequence is conditionally iid. Our result indicates that when data analyses using random distributions are conducted, it is crucial to decide whether the data in interest is sample from an unobservable identical directing measure or not.

Appendix A Appendix: The asymptotic normality of T

In this Appendix, the following proposition, which was mentioned in Remark 5, is proven.

Proposition 5.

If

 sn2θ→∞, (A.1)

then

 T−sθLn(θ)σ⇒N(0,1) (A.2)

where is as defined in (3.1) and . Moreover, when , (A.1) is also necessary for (A.2).

Proof.

For , let

 pi=θθ+j,(i−1≡j(modn)).

Consider a triangular sequence of independent Bernoulli variables, where for all . Then, from (3.2), the distribution of is the same as the distribution of

 ns∑i=1ξi.

First, we prove that (A.1) implies

 ∑nsi=1ξi−sθLn(θ)σ⇒N(0,1). (A.3)

From the central limit theorem for bounded random variables, it is sufficient to show that

. When or , it follows that . When , it follows from (A.1) that

 σ2=sθℓn(θ)∼sn22θ→∞.

Next, we prove that when (A.3) implies (A.1). Assume that (A.1) does not hold. Consider such that . Since and

 pi(1−pi)≤θns−1(θ+ns−1)2∼nsθ

for all which follows from the fact that is increasing for , it holds that

 1σ2max1≤i≤ns{pi(1−pi)}→0.

Hence, in order for (A.3) to hold, the Lindeberg condition

 limn,s,θ1σ2ns∑i=1E[|ξi−pi|21{|ξi−pi|>εσ}]=0 (A.4)

for any is necessary where is the indicator function, so we see that (A.4) does not hold. Since

 supn,s,θ(sℓn(θ))<∞,

we can take

 ε=infn,s,θ(θσ(θ+n))>0.

Then, we have

 piσ>θσ(θ+n)>ε

for all , which yields that

 E[|ξi−pi|21