1 Abstract
The GlivenkoCantelli theorem states that the empirical distribution function converges uniformly almost surely to the theoretical distribution for a random variable . This is an important result because it establishes the fact that sampling does capture the dispersion measure the distribution function imposes. In essence, sampling permits one to learn and infer the behavior of by only looking at observations from . The probabilities that are inferred from samples will become more precise as the sample size increases and more data becomes available. Therefore, it is valid to study distributions via samples. The proof present here is constructive, meaning that the result is derived directly from the fact that the empirical distribution function converges pointwise almost surely to the theoretical distribution. The work includes a proof of this preliminary statement and attempts to motivate the intuition one gets from sampling techniques when studying the regions in which a model concentrates probability. The sets where dispersion is described with precision by the empirical distribution function will eventually cover the entire sample space.
2 Preliminaries
The following definitions will be used throughout the proof.
Definition.
Empirical Distribution Function [mood]
Let be a random variable, and be a random sample from , the empirical distribution function assigns probabilities by counting how many observations are smaller than , in essence:
As a preliminary result, the work includes a proof of the following theorem:
Theorem 1.
Let be a random variable, assume that and that is a random sample from , then the empirical distribution function converges almost surely to the theoretical distribution for each point .
Proof.
For , it is possible to index a stochastic process as follows:
Therefore, and
Then for each fixed the random variable is a Bernoulli random variable that arises from indexing the stochastic process by points in the sample space of . Moreover, for all the
have finite variances and means,
Thus, by The Strong Law of Large Numbers
[rigProb, rossMod], for fixed the sample mean of the converges almost surely to the mean,Since the choice of is arbitrary, then for all fixed it is true that
∎
It is worth noting that the speed of convergence depends on how heavy the tails are, kurtosis and asymmetry of
. The more probability there is concentrated far from the center, the more the process will require larger samples to cover those regions empirically.3 Constructive proof
Theorem 2 (GlivenkoCantelli Theorem).
Let be a random variable, assume and that is a random sample from , then as the sample size approaches infinity the empirical distribution function converges uniformly almost surely to the theoretical distribution. In essence,
The following proof is motivated by ideas used in the proof of Egorov’s theorem in [grabs].
Proof.
Assume that and let be a random sample from . Define as:
Then by theorem (1),
Now for a fixed and define as:
Thus, it is possible to verify that
In essence, once the points fall into the ball no further empirical distribution functions in the sequence escape from the convergence radius. Also, for all and there exists a such that if , then
Thus,
And,
Furthermore, for all , by theorem (1), as the sample size increases, it is possible to find the required among the integers. Therefore, the union over all integers of points where the sequence does not escape the convergence radius will include ,
Hence, implies that , so . Now, consider the following:
So that
And, it is possible to find to get
Then,
Now fix and , and note that, given the previous limits, it is possible to find such that
Then, for all possible values of , it is true that
Since is arbitrary, as , the limit results in:
However, if , then for all , , it is true that
In essence, if , then uniformly. And, note that has the following probability
Therefore, the probability that uniformly is equal to one, in essence, uniformly almost surely.
∎
4 Discussion
The proof exhibits how the observed variation in sample represents the dispersion that is characterized by the model . Thus, as the sample size increases, the empirical process will be a more efficient representation of the model. Nonetheless, the theorem requires the use of nice intervals, and in higher dimensions classes of sets, as well as strong assumptions about independence between different observations. However, it also motivates further study of how empirical processes and nonparametric methods can be used to study the concentration of measure that characterizes the data under analysis.
Appendix
The sampling process eventually covers all areas with positive probability. If is discrete, then
Thus if , then as
Similarly if
denotes the range a sample from an absolutely continuous random variable covers, then the probability that the sampling process will not cover a particular point is:
Note that the intersection is an absurdity ( and ), thus, it is the empty set and it has zero probability. Moreover, as the sampling process will eventually sample from less concentrated regions. Suppose that , then
And,
Therefore as the sample size increases the range of coverage grows to capture both the left and right tails. In essence, if , then as ,
This motivates the study of how samples from the empirical distribution and a possible guess/initial distribution could report the regions in which appears to concentrate more probability. Suppose that is a partition of , then the observed clusters should resemble the concentration that a Dirichlet distribution defined over establishes for each interval . This connection between order statistics, the empirical distribution function and the measure concentrated in each observed and/or theorized interval sets and motivates the construction of the Dirichlet process and other nonparametric techniques. Moreover, it suggests that boundaries at the tail might need specific techniques to observe such data in a sample.
Notation
is Dirac measure with respect to the set and point .

: Distribution function of the random variable ;

: Probability measure of said event;

: Probability measure of the event under model’s assumption;

: Smallest order statistic;

: Largest order statistic;

: Sample space of the random variable , all possible observations belong to this set;

: Random sample assumed to arise from the model . A finite collection of independent and identically distributed random variables that are described by the dispersion model ;

: Indicator function for with respect to the set .
Comments
There are no comments yet.