An urn contains a known number of balls of two different colors. We describe the random variable counting the smallest number of draws needed in order to observe at least of both colors when sampling without replacement for a prespecified value of . This distribution is the finite sample analogy to the maximum negative binomial distribution described by Zhang, Burtness, and Zelterman (2000). We describe the modes, approximating distributions, and estimation of the contents of the urn.
Keywords: discrete distributions; negative binomial distribution; riff-shuffle distribution; hypergeometric distribution; negative hypergeometric distribution; maximum negative binomial distribution
And the LORD said unto Noah, Come thou and all thy house into the ark;
for thee have I seen righteous before me in this generation.
Of every clean beast thou shalt take to thee by sevens, the male and his female:
and of beasts that are not clean by two, the male and his female.
Genesis 7:1–2. King James translation
This charge to Noah required seven pairs of clean animals. How many animals did Noah plan on catching in order to be reasonably sure of achieving male and female pairs? He didn’t want to handle more dangerous, wild creatures than necessary. In the case of a rare or endangered species, the finite population size could be small. A “clean” animal meant it was suitable for consumption or sacrifice.
In a sequence of independent and identically distributed Bernoulli random variables, the negative binomial distribution describes the behavior of the number of failures observed before observing successes, for integer valued parameter
. This well-known distribution has probability mass function
defined for .
The negative binomial distribution (1) is discussed in detail by Johnson, Kotz, and Kemp (1992, Ch. 5). In this introductory section we will describe several sampling schemes closely related to the negative binomial. Table 1 may be useful in illustrating the various relations between these distributions.
The maximum negative binomial distribution is the distribution of the smallest number of trials needed in order to observe at least successes and failures for integer valued parameter . This distribution is motivated by the design of a medical trial in which we want to draw inference on the Bernoulli parameter in an infitely large population. If the prevalance of a binary valued genetic trait in cancer patients is very close to either zero or one then there is little to be gained in screening them for it. The statistical test of interest then, is whether is moderate or whether it is extremely close to either 0 or 1.
In order to test this hypothesis we have decided to sequentially test patients until we have observed at least of both the wildtype (normal) and abnormal genotypes. A small number of observations necessary to obtain at least of both genotypes is statistical evidence that is not far from 1/2. Similarly, a large number of samples needed to observe at least of both genotypes is statistical evidence that the Bernoulli parameter is extreme.
Let denote the ‘excess’ number of trials needed beyond the minimum of . The probability mass function of the maximum negative binomial distribution is
for and .
The maximum negative binomial distribution is so-named because it represents the larger of two negative binomial distributions: the number of failures before the th success is observed and the number of successes until the th failure is observed. This distribution is also a mixture of two negative binomial distributions (1) that are left-truncated at .
An intuitive description of the terms in (2) are as follows. There are successes and failures that occur with probability . All of the extra trials beyond must all be either successes or failures, hence the term. Finally, the last Bernoulli trial must be the one that completes the experiment ending with either the th success or the th failure.
In Zhang, et al. (2000) we describe properties of the distribution (2). The maximum negative hypergeometric distribution given at (7) below and developed in the following sections is the finite sample analogue to the maximum negative binomial distribution (2).
The parameters and are not identifiable in (2). Specifically, the same distribution in (2) results when and are interchanged. Similarly, it is impossible to distinguish between inference on and on without additional information. In words, we can’t tell if we are estimating or unless we also know how many successes and failures were observed at the point at which we obtained at least of each. A similar identifiability problem is presented for the maximum negative hypergeometric distribution described in Section 4.
|Sampling scheme||Infinite population||Finite population|
|or with replacement||without replacement|
|Predetermined||Binomial distribution||Hypergeometric distribution|
|number of items|
|Until||Negative binomial||Negative hypergeometric|
|successes||distribution (1)||distribution (4)|
|Until either||Riff shuffle or||Minimum negative|
|successes or||Minimum negative||hypergeometric distribution (5)|
|failures||binomial distribution (3)|
|Until||Maximum negative||Maximum negative|
|successes and||binomial distribution (2)||hypergeometric distribution (6), (7)|
The minimum negative binomial or riff-shuffle distribution is the distribution of the smallest number of Bernoulli trials needed in order to observe either successes or failures. Clearly, at least and fewer than Bernoulli trials are necessary. The random variable counts the total number of trials needed until either successes or failures are observed for . The experiment ends with sample numbered from the Bernoulli population.
The mass function of the minimum negative binomial distribution is
The naming of (3) as the minimum negative binomial refers to the smaller of two dependent negative binomial distributions: the number of failures before the th success, and the number of successes before the th failure. In words, distribution (3) says that there will be either Bernoulli successes and failures or else failures and Bernoulli successes. This distribution is introduced by Uppuluri and Blot (1970) and described in Johnson, Kotz, and Kemp (1992, pp 234–5). Lingappaiah (1987) discusses parameter estimation for distribution (3).
The three discrete distributions described up to this point are based on sampling from an infinitely large Bernoulli parent population. Each of these distributions also has a finite sample analogy. These will be described next.
The negative hypergeometric distribution (Johnson, Kotz, and Kemp, 1992, pp 239–42) is the distribution of the number of unsuccessful draws from an urn with two different colored balls until a specified number of successful draws have been obtained. If out of balls are of the ‘successful’ type then the number of unsuccessful draws observed before of the successful types are obtained is
with parameters satisfying and range . The expected value of in (4) is .
The negative hypergeometric distribution (4) is the finite sample analogy to the negative binomial distribution (1). Unlike the negative binomial distribution, the negative hypergeometric distribution has a finite range. The maximum negative hypergeometric distribution described in the following sections is the larger of two, dependent negative hypergeometric distributions.
The minimum negative hypergeometric distribution describes the smallest number of urn draws needed in order to observe either successes or failures. This distribution is the finite sample analogy to the riff-shuffle distribution (3). The probability mass function of the minimum negative hypergeometric distribution is
In the example of the charge to Noah, we have male/female pairs of animals captured from a finite population of males and females.
In Section 2 we give the probability mass function of the maximum negative hypergeometric distribution. Section 3 details some approximations to this distribution. In Section 4 we discuss estimation of the parameter that describes the contents of the urn.
2 The distribution
An urn contains balls: of one color; and the remaining of another color. We continue sampling from the urn without replacement until we have observed balls of both colors, for integer parameter . Sampling with replacement is the same as sampling from the maximum negative binomial distribution (2) with parameter
Let denote the random variable counting the number of extra draws needed beyond the minimum . That is, on draw numbered we will have first observed at least of both colors. All of the extra draws from the urn must be of the same color so there will be of one color and of the other color at the end of the experiment. We will describe the distribution and properties of this random variable.
For define the factorial polynomial
We also define .
The maximum negative hypergeometric distribution probability mass function can be written as
defined for the range of :
The integer valued parameters are constrained to
expresses the maximum negative hypergeometric distribution (6) in terms of binomial coefficients.
The same distribution in (6) and (7) result when the parameter is interchanged with . This remark illustrates the identifiability problem with the parameters in the maximum negative hypergeometric distribution. A similar identifiability problem occurs in the maximum negative binomial distribution given at (2). We will describe the estimation of the parameter in Section 4.
Special cases of this distribution are as follows. For general parameter values,
If then the maximum negative hypergeometric distribution is degenerate and all of its probability is a point mass at . In words, if then there can be only one possible outcome. In this case, all of the balls in the urn must be drawn before we can observe balls of both colors.
The special case of with has the form
and zero otherwise. This is also the form of the distribution for and .
The special case for and has mass function
and zero otherwise. This is also the distribution of for and . In words, this represents the distribution of the color of the last ball remaining after all but one have been drawn from the urn.
3 Properties and Approximations
|1||Unimodal for all|
There are five basic shapes that the maximum negative hypergeometric distribution will assume. These are illustrated in Figs. 1 through 5. In each figure, the limiting maximum negative binomial distribution (2) is also presented. This limit can be expressed, more formally, as follows.
Lemma 1. For fixed values of , let and both grow large such that for bounded between zero and one. Then the behavior of the maximum negative hypergeometric random variable (6) approaches the maximum negative binomial distribution (2) with parameters c and p.
Proof. Values of remain bounded with high probability under these conditions. In (6) we write
where and .
We can also write
A similar argument shows
completing the proof.
In words, if and are both large then sampling from the urn without replacement is almost the same as sampling with replacement. Sampling with replacement is the same as sampling from a Bernoulli parent population yielding the maximum negative binomial distribution (2).
We next describe the modes for this distribution. The maximum negative hypergeometric distribution can have either one or two modes. Write
to show that this distribution always has at least one local mode at .
The maximum negative binomial distribution (2) also has at least one local mode at for all values of the parameter . The local mode of the maximum negative hypergeometric distribution at is clearly visible in Figs. 1, 2, 4, and 5. The local mode at in Fig. 3 is also present but it is very small.
Table 2 presents examples of parameter values corresponding to unimodal distributions in (7). In general, there will be only one mode at when is not too far from 1/2. The range of with unimodal distributions becomes narrower as becomes larger when is fixed. If then the distribution is always unimodal.
3.1 A gamma approximation
An approximate gamma distribution is illustrated in Fig. 4. Under the conditions of the following lemma, the local mode atbecomes negligible.
Lemma 2. For fixed if grows as for large and some then behaves approximately as the sum of independent standard exponential random variables.
Proof. Begin at (6) and write
Under the conditions of his lemma, the term
will be much smaller than and can be ignored.
For near zero, write
The transformation has Jacobian so
ignoring terms that tend to zero for large vales of This is the density function of the sum of independent, standard exponential random variables.
3.2 A half-normal approximation
has a standard normal distribution then the distribution ofis said to be standard half-normal or folded normal. The density function of the random variable is
for (Stuart and Ord, 1987, p 117). The approximate half-normal behavior of the maximum negative hypergeometric distribution is illustrated in Fig. 5.
Lemma 3. When becomes large, if and grows as then behaves approximately as a standard half-normal random variable.
The proof involves expanding all factorials in (7) using Stirling’s approximation. The details are provided in Appendix A.
3.3 A normal approximation
The normal approximation to the maximum negative hypergeometric distribution can be seen in Fig. 3. This is proved more formally in Lemma 4, below. No generality is lost by requiring because and can be interchanged to yield the same distribution.
Lemma 4. For large values of , suppose grows as and for Then behaves approximately as standard normal where
The proof of thsi lemma is given in Appendix B. The details involve using Stirling’s approximation to all of the factorials in (7) and expanding these in a two-term Taylor series.
The most practical situation concerning parameter estimation involves estimating the parameter when and are both known. In terms of the original, motivating example drawing inference on the genetic markers in cancer patients, the finite population size will be known, and the parameter is chosen by the investigators in order to achieve specified power and significance levels. The parameter describes the composition of the individuals in the finite-sized population. The value of is known without error if all subjects are observed.
The estimation of in this section is made on the basis of a single observation of the random variable . We will treat the unknown parameter as continuous valued rather than as a discrete integer as it has been used in previous sections.
The log-likelihood kernel function of in (6) is
As a numerical illustration, the function is plotted in Fig. 6 for and . Observed values of are given as in this figure. The range of valid values of the parameter are for the values of and in this example. Smaller observed values of in this example exhibit log-likelihood functions with a single mode corresponding to maximum likelihood estimates of For values of the likelihood has two modes, symmetric about .
Intuitively, if the observed value is small then we are inclined to believe that the urn is composed of an equal number of balls of both colors. That is, if we quickly observe of both colored balls then this is good statistical evidence of an even balance of the two colors in the urn. Conversely, if the observed is relatively large then we will estimate an imbalance in the composition of the urn. Without the additional knowledge of the number of successes and failures observed then we are unable to tell if we are estimating or .
More generally, there will be either one mode of at or else two modes, symmetric about depending on the sign of
evaluated at . If is negative then there will be one mode of at .
Useful rules for differentiating factorial polynomials are as follows. For ,
and for ,
Use these rules to write
and show that the likelihood always has a critical value at :
The critical point of at may either be a global maximum or else a local minimum as seen in the example of Fig. 6. This distinction depends on the sign of the second derivative of .
The second derivative of can be found from
The sign of is the same as that of
The first summation in is zero when or 1. The function is then negative for small values of demonstrating that the maximum likelihood estimate of is in these cases. Similarly, is an increasing function of and may eventually become positive for larger values of so that will have two modes. These modes are symmetric about because
for all .
In other words, a small observed value of leads us to believe that there are an equal number of balls of both colors in the urn and estimate by . Similarly, a large observed value of relative to leads us to estimate an imbalance in the composition of the urn.
Johnson, N.L., S. Kotz, and A.W. Kemp (1992). Univariate Discrete Distributions. New York: John Wiley & Sons.
Lingappiah, G.S. (1987). Some variants of the binomial distribution. Bulletin of the Malaysian Mathematical Society 10: 82–94.
Stuart, A. and J.K. Ord (1987). Kendall’s Advanced Theory of Statistics Vol 1, 5th Edition: Distribution Theory, New York: Oxford University Press.
Uppuluri, V.R.R., and W.J. Blot (1970). “A probability distribution arising in a riff-shuffle.”Random Counts in Scientific Work, 1: Random Counts in Models and Structures, G.P. Patil (editor), University Park: Pennsylvania State University Press, pp 23–46.
Zhang, Z., B.A. Burtness, and D. Zelterman (2000). The maximum negative binomial distribution. Journal of Statistical Planning and Inference 87: 1–19.
Appendix A: Proof of the half-normal approximation
The proof of Lemma 3 is provided here. We assume that and The random variable is
Expand all of the factorials in (7) using Stirling’s approximation giving
contains terms in and
contains terms that are .
In all of the following expansions it is useful to keep in mind that is approximately equal to and . Write as
Expand every appearance of for near zero to show
Similarly, we can write
Then write for near zero, giving
These expressions for and in give
Finally, we note that is the Jacobian of the transformation . Then
is the density of the folded normal distribution, except for terms that tend to zero with high probability.
Appendix B: Standard normal approximate distribution
The details of the proof of Lemma 4 are given here. Define as
in (7) is much smaller than and can be ignored under the conditions of this lemma.
Expand all of the factorials in using Stirling’s formula giving
corresponds to and ;
corresponds to and ; and
corresponds to and .
Write out all of the terms in to show
We can write as
we can expand
for near zero to show
where is a random variable, giving
Substitute to show
Next write as
Expand the argument of the first logarithm in here in a two-term Taylor series showing
Substitute values of and giving